Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong File Type Error #185

Open
jacordova opened this issue Mar 1, 2024 · 1 comment
Open

Wrong File Type Error #185

jacordova opened this issue Mar 1, 2024 · 1 comment
Labels

Comments

@jacordova
Copy link

Hi PPanGGoLiN team,

I'm trying to run .gff files but for some reason I keep getting the following error:

Exception: Wrong file type provided. This looks like a fasta file. You may be able to use --fasta instead

I'm calling ppanggolin like this:

ppanggolin all --anno ecoli.gff.list --cpu 10

The .gff files originate from our lab's database and generally look fine when compared to the example .gff files provided. I've attached two example files; note that I had to change the extension to .txt to upload. One is the original and the second is one that's been edited to clean up the annotation, essentially only leaving the ID in the last column. Either way, I still got the same error.

I also attached the whole error log.

Any thoughts? Thanks in advance for your help!

WIS_EcoW3110uw_DRAFTv1.txt
WIS_EcoW3110uw_DRAFTv1_edited.txt
Ppanggolin_error.txt

@axbazin
Copy link
Member

axbazin commented Mar 1, 2024

Hi,

So indeed the error is quite misleading, as it is raised when neither gff3 nor gbk are detected. For gff3, the detection is based on the gff-version pragma at the beginning.

Here the gff3 detection failed because the header is expected to be "##gff-version 3" with a space, while your is "##gff-version\t3" with a tab.

While it is indeed not explicitely specified in the gff3 specifications (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md ) , in the examples provided as well as in popular tools (bakta prokka pgap and so on) the space separator is used to separate the "gff-version" string and the version itself.

If I may, there a few other problematic elements in the gff3:

In column 9, there are so other minor things that will not raise any error, but are not common use, and won't be used "automatically" by PPanGGOLiN's parser while with other terms they could:

  • You use "topology" for linearity or circularity of contigs, while there is a reserved term for this in gff3 (Is_circular) that even other parsers will be more likely to use if present.
  • You use "description" for a field that is often used as "product". This is not super common and PPanGGOLiN's parser will not retrieve this information, though I don't think this matters much as PPanGGOLiN does not do much with this appart from reporting it back in some output files.

I hope this helps, Have a nice day !
Adelme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants