Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to run panaroo_2_pagoo without removing "refound_"? #64

Open
emilywollmuth opened this issue Oct 2, 2023 · 2 comments
Open

Comments

@emilywollmuth
Copy link

I am working with a divergent group of organisms, so I want to keep the genes refound by panaroo because I believe the core genome is being underestimated without them. My understanding is that the genes labeled "refound_" are not inherently a problem. Instead, it's the genes with the "stop" marker that are likely pseudogenes. Is there a way to run panaroo_2_pagoo and not remove all "refound" genes but to still remove "_stop" genes?

Alternatively, is it possible to use panaroo_2_pagoo and remove no genes or to make a pagoo matrix directly from a csv matrix instead of the long format data? If this isn't possible, I'll just need to convert the panaroo matrix to long format for pagoo to reformat it back into a matrix again.

Thanks for your work on this package. Pagoo is great and so easy to use!

@iferres
Copy link
Owner

iferres commented Oct 3, 2023

Hi @emilywollmuth , currently the only way is to manually create a Pagoo object using the long format ( https://iferres.github.io/pagoo/articles/Input.html ) including those marked genes. I should probably consider to modify this function to include them, but I don't have the time to do it right now. Sorry :(

I wrote this function some time ago, so I'm not sure, but I think it was a problem when reading the csv AND the gffs because the "_refound" "_stop", etc, genes were not present in the gff files so Pagoo couldn't match ids with their sequences. For sake of simplicity I decided to remove them all, but you're right that ideally they shouldn't be a problem when working only with the csv file.

I'll leave the issue open but sincerely I don't know when will I address it.

@emilywollmuth
Copy link
Author

Thanks for the quick response @iferres! No worries. I figured it was because the gff matching issue that would arise that the program doesn't do this. I'll manually create a Pagoo object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants