Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dedupe.sh to identify and remove contigs contained in others (merged mode) #701

Open
eperezv opened this issue Jun 26, 2023 · 2 comments

Comments

@eperezv
Copy link

eperezv commented Jun 26, 2023

Hi,

I'm trying to run SqueezeMeta in Merged mode because my data is too big to allow coassembly. When running cd-hit, I realized it was also too slow so I thought of running dedupe.sh with a minimum identity of 99.

I was wondering if this alternative, which is faster than cd-hit, and I think it does the same, would be suitable to run squeezemeta in merged mode.

Cheers

@fpusan
Copy link
Collaborator

fpusan commented Jun 26, 2023

It seems like it could be a valid alternative, and I am actually quite interested in seeing how it works for you. Please keep us posted!

@eperezv
Copy link
Author

eperezv commented Jun 27, 2023

I was able to run dedupe.sh, which finished in around 10 min when cd-hit was taking days (and didn't finish). It actually identified and removed contigs that were identical or contained in others, removing ca. 30% of the sequences. I am now running minimus2, but it will take a lot probably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants