Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical Clustering #81

Open
seajane opened this issue Apr 18, 2022 · 5 comments
Open

Hierarchical Clustering #81

seajane opened this issue Apr 18, 2022 · 5 comments
Assignees

Comments

@seajane
Copy link

seajane commented Apr 18, 2022

Is it possible to view the hierarchical clustering tree that is created when the tile plots are made? It would be useful to see how some of the groups branched.

@jpjarnoux jpjarnoux self-assigned this Apr 27, 2022
@jpjarnoux jpjarnoux pinned this issue Apr 29, 2022
@jpjarnoux jpjarnoux unpinned this issue Apr 29, 2022
@axbazin
Copy link
Member

axbazin commented May 2, 2022

Hi,

My apologies for the delayed response, I messed up my github config and was not receiving notifications anymore.

It is currently impossible to view the results of the clustering, though maybe it is possible to add that feature to the command eventually.
maybe it is better to do an actual phylogeny instead ? You may not get the same clustering, but the hierarchical clustering being based only on the presence/absence of gene families, it is not very reliable and does not really replace building an actual phylogeny, if you wish to see how your different genomes are related to each other.

Adelme

@seajane
Copy link
Author

seajane commented May 2, 2022

Thanks, we already have trees based on traditional phylogeny. The presence/absence tree revealed some unique grouping that correlates to another categorical grouping of these strains and so was very interesting in itself as well as showing differential genes are present. The distance and strength of this association would be really amazing to have access to.

@axbazin
Copy link
Member

axbazin commented May 3, 2022

Alright I see ! I guess it is something that could be added in the futur.

What is done in this is basically compute jaccard similarities between vectors of presence absence of gene families for each genome, then make a dendrogram based on those similarities. The function that is used for this can actually output a plot, so having as optional additional output both the matrix and the tree of the clustering wouldn't be too difficult to obtain, I think.

@cmonat
Copy link

cmonat commented Mar 24, 2023

Hello,

I'm also interested to get this tree, how is it possible?
Thanks a lot

C.

@ggautreau
Copy link
Collaborator

Hi Cécile,

It seems to be possible to integrate dendrograms next to an heatmap ( https://plotly.com/python/dendrogram/ ) using Plot.ly so I will test if this could be added to PPanGGOLiN.

Au plaisir :)

@ggautreau ggautreau self-assigned this Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants