Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on hierarchical read classification with multiple databases #161

Open
gtonkinhill opened this issue Jan 10, 2024 · 2 comments

Comments

@gtonkinhill
Copy link

Hi, thanks for creating such a useful tool!

Apologies if I've missed this in the documentation. I wanted to clarify how krakenuniq handles multiple databases when run as

krakenuniq --db HOST --db PROK --db EUK_DRAFT 
  • Am I correct in assuming that only kmers that do not match the HOST DB will be subsequently searched in the PROK DB?
  • Would this generally be a more conservative way to remove host DNA than including the host genome in a single DB?
  • Given a single taxonomy, is it possible to have the same genome in multiple DB's or does this cause problems and is it important to ensure the DBs do not overlap?
@salzberg
Copy link
Collaborator

hmm, maybe some of the others will answer but all I can say is what I do - I never run krakenuniq with multiple DBs. Instead, I run it with one DB and then use krakentools to extract all the unmapped reads. I then take those reads and (if I have a 2nd DB) I run them against the 2nd DB.
And to remove host DNA, I usually run bowtie2 to align against human (that's the only host I've filtered out) and then take the unmapped reads from that, and run them through KrakenUniq.
It does not cause a problem to have the same genome in multiple DBs. However some kmers might get assigned to different taxonomic IDs if you do that.

@gtonkinhill
Copy link
Author

Thanks very much for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants