Extracting AA sequences from each sample #245

andressamv · 2022-10-28T02:14:24Z

Hi, I have a question about the Kaiju output. I want to extract the AA sequences derived from fungi from each sample. My goal is to use the sequences as input in tools such as GhostKoala. I understand that using -v I will get those sequences, but how can I extract only the fungal ones? More importantly, how can I know the sample of origin of each read? Thank you

pmenzel · 2022-10-28T13:49:49Z

Hi,

you could use kaiju-addTaxonNames -p to add the full taxonomic information to each line in the kaiju output file (see here) and then just keep files containing Fungi
I don't get the second question

pmenzel · 2022-10-29T19:04:31Z

btw, please note that the sequences shown in the output file are the matched database sequences and not the translated amino acid sequences from the reads

andressamv · 2023-01-06T22:02:26Z

Hi @pmenzel. Thank you for this info! Can you explain what this means for mismatches? If I understood correctly, if I have mismatches, I will not be able to see them in the output, only the reference sequence. Is this correct?

pmenzel · 2023-01-07T16:52:23Z

yes, the output shows the matched database sequence.

andressamv closed this as completed Oct 28, 2022

andressamv reopened this Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting AA sequences from each sample #245

Extracting AA sequences from each sample #245

andressamv commented Oct 28, 2022

pmenzel commented Oct 28, 2022

pmenzel commented Oct 29, 2022

andressamv commented Jan 6, 2023

pmenzel commented Jan 7, 2023

Extracting AA sequences from each sample #245

Extracting AA sequences from each sample #245

Comments

andressamv commented Oct 28, 2022

pmenzel commented Oct 28, 2022

pmenzel commented Oct 29, 2022

andressamv commented Jan 6, 2023

pmenzel commented Jan 7, 2023