Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about the contents of gene_to_gene_family.tsv from projection #221

Open
szhan opened this issue May 13, 2024 · 4 comments
Open
Assignees
Labels

Comments

@szhan
Copy link

szhan commented May 13, 2024

I have been running projection on a reconstructed pangenome and a set of assembly FastA files for input genomes, in order to assign each gene to a gene family in the pangenome for each input genome.

I tried consulting the documentation about the output of projection, but the link doesn't seem to go anywhere (https://github.com/labgem/PPanGGOLiN/blob/f3ba6a1f33256f19175b570c4b711bb8970d0365/docs/user/projection.md).

The documentation states that gene_to_gene_family.tsv "provides the mapping of genes to gene families of the pangenome." I was expecting to see one line per gene for an input genome, which indicates that the gene in a line is assigned to a gene family in the reconstructed pangenome. But this isn't what I got. Instead, I got files with 100s of thousands of lines, even though an input genome contains 2.5k to 2.9k genes.

Any clarifications would be much appreciated. Thank you in advance.

@axbazin
Copy link
Member

axbazin commented May 14, 2024

Hi,

The "projection" documentation about its output files is here: https://ppanggolin.readthedocs.io/en/latest/user/projection.html#output-files

However, indeed it is right that the current behavior is not the one that was intended. I see where the bug is. Currently, the "gene_to_gene_family.tsv" file contains this information for ALL given input genomes, and not just the single input genome. The file is likely equal between the different "input genome" output directories. we'll get a fix for this in the upcoming version.

Thank you very much for the bug report.

Adelme

@axbazin axbazin added the bug label May 14, 2024
@szhan
Copy link
Author

szhan commented May 14, 2024

Thank you for the explanation. I checked whether "The file is likely equal between the different "input genome" output directories" for a few input genomes. But it didn't seem to be the case. I look forward to the updated version. Thank you.

@axbazin
Copy link
Member

axbazin commented May 14, 2024

Alright thank you for the additional input, and indeed I misunderstood what you meant, I see the broken link now ! Will fix this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants