Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error! Didn't find taxonomy ID mapping for sequence #150

Open
gibberwocky opened this issue Sep 12, 2023 · 1 comment
Open

Error! Didn't find taxonomy ID mapping for sequence #150

gibberwocky opened this issue Sep 12, 2023 · 1 comment

Comments

@gibberwocky
Copy link

Attempting to build a custom database. Some sequences are being incorporated without issue, others are not. As an example, the following error is that linked to sequences that are classed as unclassified:

Error! Didn't find taxonomy ID mapping for sequence NC_001566.1|kraken:taxid|7460!!

The contig has matching labels in both the fasta and map file that are stored in the library folder, illustrated below in output from grep of contig name:

Apis_mellifera_strain_DH4-tax7460-GCF_003254395.2_Amel_HAv3.1_genomic-dustmasked.fna:>NC_001566.1|kraken:taxid|7460 Apis mellifera ligustica mitochondrion, complete genome

Apis_mellifera_strain_DH4-tax7460-GCF_003254395.2_Amel_HAv3.1_genomic.fna.map:NC_001566.1 7460 GCF_003254395.2 Apis mellifera strain=DH4

The taxID is present in the taxDB file generated during the build process:

$ grep "Apis mellifera" taxDB | head
88217 7460 Apis mellifera carnica subspecies
7469 7460 Apis mellifera ligustica subspecies
7460 7459 Apis mellifera species
44477 7460 Apis mellifera mellifera subspecies
441644 7460 Apis mellifera adansonii subspecies
428024 7460 Apis mellifera unicolor subspecies
428022 7460 Apis mellifera ruttneri subspecies
409484 2640676 Wolbachia endosymbiont of Apis mellifera carnica species
346612 7460 Apis mellifera meda subspecies
346611 7460 Apis mellifera jemenitica subspecies

And in the seqid2taxid.map:

$ grep "Apis mellifera" seqid2taxid.map | head
NC_037638.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NC_037639.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NW_020555788.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NC_037640.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NC_037641.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NC_037642.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NW_020555789.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NC_037643.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NW_020555790.1 7460 GCF_003254395.2 Apis mellifera strain=DH4
NC_037644.1 7460 GCF_003254395.2 Apis mellifera strain=DH4

So I'm not sure what the problem is. Can anyone help?

KrakenUniq version 1.0.4

@Thomas-Bcp
Copy link

It looks like Kraken uses a space to separate the sequence ID from the rest of the header. So it looks for NC_001566.1|kraken:taxid|7460 (until the first space) instead of just NC_001566.1. If you add a space after NC_001566.1 in your Fasta file, it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants