Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting AA sequences from each sample #245

Open
andressamv opened this issue Oct 28, 2022 · 4 comments
Open

Extracting AA sequences from each sample #245

andressamv opened this issue Oct 28, 2022 · 4 comments

Comments

@andressamv
Copy link

Hi, I have a question about the Kaiju output. I want to extract the AA sequences derived from fungi from each sample. My goal is to use the sequences as input in tools such as GhostKoala. I understand that using -v I will get those sequences, but how can I extract only the fungal ones? More importantly, how can I know the sample of origin of each read? Thank you

@pmenzel
Copy link
Member

pmenzel commented Oct 28, 2022

Hi,

  • you could use kaiju-addTaxonNames -p to add the full taxonomic information to each line in the kaiju output file (see here) and then just keep files containing Fungi
  • I don't get the second question

@pmenzel
Copy link
Member

pmenzel commented Oct 29, 2022

btw, please note that the sequences shown in the output file are the matched database sequences and not the translated amino acid sequences from the reads

@andressamv
Copy link
Author

Hi @pmenzel. Thank you for this info! Can you explain what this means for mismatches? If I understood correctly, if I have mismatches, I will not be able to see them in the output, only the reference sequence. Is this correct?

@pmenzel
Copy link
Member

pmenzel commented Jan 7, 2023

yes, the output shows the matched database sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants