Merge pangenome graphs #68

genomesandMGEs · 2021-10-06T07:41:53Z

Hi there,

Is it possible to merge pangenome graphs from independent runs? I know panaroo has that option, and would like to know if it would be possible to do so with ppanggolin.
If not, could you please provide me alternatives to compare the pangenome of independent runs?

Thanks!

axbazin · 2021-10-06T08:23:14Z

Hi,

What are you trying to achieve through this comparison, exactly?
Is it for example to compare the gene families and their partitions between both pangenomes, and know which family is persistent in both pangenomes, which is shell in one and persistent in the other, things like that?

Adelme

genomesandMGEs · 2021-10-06T08:25:52Z

Hey,

Thanks for the (super) quick reply!
Exactly, that's what I was thinking about that.

axbazin · 2021-10-06T08:55:50Z

We do not have something that directly implements a straightforward comparison between two pangenomes (for now), however you can get that with some file comparisons.
Assuming you have the latest version installed, you can do the following:

get all family sequences for both pangenomes:

ppanggolin fasta --prot_families all -p pangenome_1.h5 -o prot_pangenome_1 
ppanggolin fasta --prot_families all -p pangenome_2.h5 -o prot_pangenome_2

Those commands will write a file 'all_protein_families.faa' in the output directory provided with -o.
Then, you can compare this file to the other pangenome:

ppanggolin align -p  pangenome_1.h5 --proteins prot_pangenome_2/all_protein_families.faa -o align_prot_pang2_to_pang1
ppanggolin align -p  pangenome_2.h5 --proteins prot_pangenome_1/all_protein_families.faa -o align_prot_pang1_to_pang2

You can provide --identity (default is 0.5) and --coverage (default is 0.8) thresholds for the comparison.
In both your output directories 'align_prot_pang2_to_pang1' and 'align_prot_pang1_to_pang2' you will get two files:
The first one called 'proteins_partition_projection.tsv' which is tab separated, and will give you a file akin to this:

The first column indicates a family id from the faa file, and the second column indicates the partition of the most similar family in the pangenome it was compared to.

And alternatively the 'input_to_pangenome_associations.blast-tab' file is a alignment file with blast-like results on the proteins vs pangenome alignment, which will give you family ids from both pangenomes directly. (there can be multiple hits)

By comparing those files, and the origin family partitions, you should be able to get what you want, I believe?
If you have any question or need me to clarify something, do not hesitate !

Adelme

genomesandMGEs · 2021-10-06T10:40:49Z

Hey,

Thanks for the detailed explanation.

So, if I understood correctly, this approach will give you information about the family ids from pangenome 1 that match families in pangenome 2, right? But the classification in the 2nd column only let's you know that a given id is considered 'persistent' in pangenome 2, and may not be so in pangenome 1?

Also, family ids not listed in column 1 from the 'proteins_partition_projection.tsv' will represent family-specific ids from pangenome 1, i.e. which have no match in pangenome 2?

axbazin · 2021-10-06T11:23:03Z

Yes absolutely, you are correct for all of your points.

If you want you can play with the filters available with ppanggolin fasta, which can make things simpler for your comparison, you can do stuff like this:

ppanggolin fasta --prot_families persistent -p pangenome_1.h5 -o prot_pangenome_1

to write only the persistent gene families (in a file called 'persistent_protein_families.faa'). You can do this with all partitions, the filename will change accordingly.

Adelme

jpjarnoux self-assigned this Apr 5, 2022

jpjarnoux added the enhancement label Apr 5, 2022

axbazin mentioned this issue Oct 3, 2023

Comparing core genomes #136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge pangenome graphs #68

Merge pangenome graphs #68

genomesandMGEs commented Oct 6, 2021

axbazin commented Oct 6, 2021

genomesandMGEs commented Oct 6, 2021

axbazin commented Oct 6, 2021

genomesandMGEs commented Oct 6, 2021

axbazin commented Oct 6, 2021 •

edited

Merge pangenome graphs #68

Merge pangenome graphs #68

Comments

genomesandMGEs commented Oct 6, 2021

axbazin commented Oct 6, 2021

genomesandMGEs commented Oct 6, 2021

axbazin commented Oct 6, 2021

genomesandMGEs commented Oct 6, 2021

axbazin commented Oct 6, 2021 • edited

axbazin commented Oct 6, 2021 •

edited