Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevelance python script only useful for hpv16? #16

Open
jolvany opened this issue Jul 6, 2021 · 0 comments
Open

Prevelance python script only useful for hpv16? #16

jolvany opened this issue Jul 6, 2021 · 0 comments

Comments

@jolvany
Copy link

jolvany commented Jul 6, 2021

HiEric!

I was recently recommended your program, and I am trying to figure out the best method to utilized and understand this process for my end goal. So I am effectively trying to use sequencing reads to detect malaria in sub-Saharan Africa. In this region there are four possible species of parasite these reads could originate from- and what I'm trying to do, through the language of your program, is classify the reads using four references and either count the reads for each species or use your score_real_classification.py to get percentages.

Here is the command I used to classify since "classify" is currently unavailable:

rkmh stream -r PlasmoDB-51_PfalciparumCD01_Genome.fasta -r PlasmoDB-51_PvivaxP01_Genome.fasta -r PlasmoDB-
51_PmalariaeUG01_Genome.fasta -r PlasmoDB-51_PovalecurtisiGH01_Genome.fasta  -f NWD104090.fasta -s 1000 -k 10 > out.rk 

This did produce a viable out file (I can see output), but when I try to tabulate through your python scripts these are the errors I get:


python scripts/score_real_classification.py < out.rk > out.cls
scripts/score_real_classification.py:60: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if l_match is not "" and s_match is not "" and l_match is not s_match[0]:
scripts/score_real_classification.py:60: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if l_match is not "" and s_match is not "" and l_match is not s_match[0]:
scripts/score_real_classification.py:70: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if l_match is not "":
scripts/score_real_classification.py:73: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if s_match is not "":
Traceback (most recent call last):
  File "scripts/score_real_classification.py", line 21, in <module>
    read_len = int(tokens[2].strip().split("/")[1])
IndexError: list index out of range

And I am wondering if you could shed light on this error? Is it because it is designed to be for the hpv16 usage and I will have to write my own tabulation script? Or do I need to change something so it can recognize the available species in my input?

I can email you an example of the out.rk file but even when I make it .txt it won't let me attach it here for some reason. But here is the species breakdown:
Anything with Pv is Plasmodium vivax
Anything with Pf Plasmodium falciparum
Anything with Pm Plasmodium malariae
Anything with Po Plasmodium ovale

And finally if your answer is that I will need to write my own script, is the out.rk file supposed to be understood as:
[Reference it mapped to] [Read] [kmers mapped] [sketch size]?

Thank you in advance!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant