Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different result from blast hit #253

Open
MonicaSteffi opened this issue Feb 6, 2023 · 4 comments
Open

Different result from blast hit #253

MonicaSteffi opened this issue Feb 6, 2023 · 4 comments

Comments

@MonicaSteffi
Copy link

MonicaSteffi commented Feb 6, 2023

Dear Developer,

Thank you so much for Kaiju.
I performed spade assembly (for the virus) and performed kaiju on assembled contigs. For the reference database, I downloaded the viral database from the kaiju web server and completed the taxonomic assignment (with default options).

I got one characterized contig belonging to "Gordonia phage Gustav". I blasted the same contigs using blastn option and I got the best hit as "Podoviridae sp". They both belong to the same class though.

How do I confirm these results at the species or genus level?

Any help would be appreciated

Thank you in advance

Regards
Monica

@pmenzel
Copy link
Member

pmenzel commented Feb 6, 2023

Dear Monica,

Kaiju is primarily used for very fast heuristic alignments of raw sequencing reads against a reference database.
The speed advantage comes with a trade-off in accuracy (both in terms of sensitivity and specificity).
Kaiju only considers the best match, and it cannot deal with gaps in the alignment.

Therefore, slower methods, such as BLAST generally produce more accurate results, especially as it can do gapped alignments.
Of course, the species contained in the reference database can also change the results, i.e. are both contained in the kaiju database?

Have a look, if you can see how the BLAST alignment for the Gordonia phage differs from the alignment for the best hit.
It might give you a hint, which species might be the closest relative to your actual species in the sample.

@MonicaSteffi
Copy link
Author

MonicaSteffi commented Feb 6, 2023

Dear Monica,

Kaiju is primarily used for very fast heuristic alignments of raw sequencing reads against a reference database. The speed advantage comes with a trade-off in accuracy (both in terms of sensitivity and specificity). Kaiju only considers the best match, and it cannot deal with gaps in the alignment.

Therefore, slower methods, such as BLAST generally produce more accurate results, especially as it can do gapped alignments. Of course, the species contained in the reference database can also change the results, i.e. are both contained in the kaiju database?

Have a look, if you can see how the BLAST alignment for the Gordonia phage differs from the alignment for the best hit. It might give you a hint, which species might be the closest relative to your actual species in the sample.

Hi Peter,
Thank you for the quick reply. I tried to align my contig with both Gordonia phage (complete genome sequence) and Podoviridae sp (according to blast hit). I got nearly 97% to identity and 80% query coverage for Podoviridae sp. But for Gordonia phage, query coverage is 0.1% and Evalue is 0.46. I didn't expect this poor alignment in Kaiju results.

Do u think this might be due to the reference database I used? Or am I doing something wrong here ?

I also downloaded the viral ref database from NCBI https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/ and performed Kaiju again. But again still got the same hit

Thank you in advance

@pmenzel
Copy link
Member

pmenzel commented Feb 7, 2023

what's the accession number of your Podoviridae sp sequence? Is it contained in your kaiju db?

@MonicaSteffi
Copy link
Author

MonicaSteffi commented Feb 8, 2023

what's the accession number of your Podoviridae sp sequence?
BK017549.1
The top hit is BK017549.1 | TPA: Podoviridae sp. isolate ctHFD3, partial genome\

Is it contained in your kaiju db?

Hi,
kaiju names.dmp has Podoviridae species information. But not that particular entry.
grep "Podoviridae" names.dmp
10744 | Podoviridae | Podoviridae | scientific name |
196895 | unassigned Podoviridae | | equivalent name |
196895 | unclassified Podoviridae | | scientific name |
321841 | environmental samples | environmental samples <viruses,family Podoviridae> | scientific name |
1701671 | uncultured Podoviridae | | equivalent name |
2202567 | Podoviridae sp. | | scientific name |
2495573 | Podoviridae phage DK1 | | scientific name |
2656707 | Podoviridae sp. ct2cs2 | | scientific name |
2656708 | Podoviridae sp. ctDWo9 | | scientific name |
2656709 | Podoviridae sp. ctKoA10 | | scientific name |
2656710 | Podoviridae sp. ctLUJ1 | | scientific name |
2656711 | Podoviridae sp. ctQNx1 | | scientific name |
2656712 | Podoviridae sp. ctg2L5 | | scientific name |
2656713 | Podoviridae sp. ctka020 | | scientific name |
2656714 | Podoviridae sp. ctpVR23 | | scientific name |
2656715 | Podoviridae sp. ctrTa16 | | scientific name |
2656716 | Podoviridae sp. ctviO18 | | scientific name |
2656717 | Podoviridae sp. cty5g4 | | scientific name |
2675442 | Podoviridae sp. ctbj_2 | | scientific name |
2675443 | Podoviridae sp. ctdb7 | | scientific name |
2675444 | Podoviridae sp. ctcf755 | | scientific name |
2675445 | Podoviridae sp. ctbd591 | | scientific name |
2675446 | Podoviridae sp. ctbh1 | | scientific name |
2675447 | Podoviridae sp. ctfa10 | | scientific name |
2675448 | Podoviridae sp. ctda_1 | | scientific name |
2675449 | Podoviridae sp. ctdc61 | | scientific name |
2675450 | Podoviridae sp. ctjc_2 | | scientific name |
2731643 | Podoviridae | Podoviridae | in-part |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants