Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix distance to scatter plot #20

Open
JudKil opened this issue Jun 21, 2022 · 7 comments
Open

Matrix distance to scatter plot #20

JudKil opened this issue Jun 21, 2022 · 7 comments

Comments

@JudKil
Copy link

JudKil commented Jun 21, 2022

Hello everyone,

My name is Judith and for my PhD studies, I would like to use your beautiful scripts.
I get a distance matrix by rmsd between each pose but I don't see how to pass it to a scatter plot of 2 clusters, I tried with pandas but I'm really blocked, I can't select the lines and the columns to generate the scatter plot

Best Regards,
Judith

@wjm41
Copy link
Owner

wjm41 commented Jun 23, 2022

Hi Judith! Let me try to help :) Could you attach some example code and give more details about what you are trying to plot?

From my understanding, you have some conformers (for the same/different molecules?) and have clustered them into two clusters and want to visualise the poses. What are the axes of the scatter plot you are trying to do? Are they just PCA axes or something like that?

@JudKil
Copy link
Author

JudKil commented Jun 24, 2022

Hello,

I have a portein on which I have docked a ligand with Autodock Vina.

Vina has generated several possible ligand poses (my different conformers). I have drawn a distance matrix between each conformation. I would then like to group them in a cluster via a kmeans. I succeeded in making the kmeans and I see the scatter plot of my ligands. Yes I want to visualize the positions with the names on the dots. But as I can't add the name. Therefore I don't know which is which on the figure

I send you :
-the script

  • the protein in sdf format
  • the ligands generated in sdf format.

(KEEEP you need to change .txt by .sdf
lig-dump-rec-1.all_poses.txt
1.txt
pose_clustering_centroid.txt

For the script .txt by .ipybd
otherwise I couldn't send them to github)
Maybe you would need to make what pip to get the script to run

Thanks,
Judith

@wjm41
Copy link
Owner

wjm41 commented Jun 24, 2022

The easiest way to add the 'names' of the molecules would be to to add the pose index as a column in the table dataframe:

import plotly.express as px

table['pose_index'] = [f'Molecule Index: {index}' for index in table.index]
table['cluster'] = kmeans.labels_
table['cluster'] = table['cluster'].astype(str)

px.scatter(table, 
            x="1", 
            y="2",
            color='cluster',
            hover_name='pose_index',
            width=1000,
            height=800)

which would look something like this:

Screenshot 2022-06-24 at 16 29 38

I have also modified the molplotly code a bit to show the actual 3D coordinates as well, but I'm not sure how helpful that would be:

Screenshot 2022-06-24 at 16 30 29

Let me know if that would be helpful and I can merge it into the package :)

@JudKil
Copy link
Author

JudKil commented Jun 24, 2022

Dear William,

Thank you very much, that's it ! But is there an option to add the centroid of clusters ?

Best !
Judith

@JudKil
Copy link
Author

JudKil commented Jun 24, 2022

Yes, the 3D option is also interesting ! The last but not least, do you know increase the size of the dots, when I try "size", I don't see de arguments format, size=10 or size=10:10:20 don't work

Best !
Judith

@wjm41
Copy link
Owner

wjm41 commented Nov 24, 2022

So so sorry for not replying, I had totally missed your follow-up questions! In any case, here are some answers:

increase the size of the dots

This can be done by adding scatter_fig.update_traces(marker=dict(size=12)) and adjusting the size parameter in the argument!

is there an option to add the centroid of clusters

This can be done via scatter_fig.add_trace - the full example is below:

import plotly.graph_objects as go

rmsd_df['pose_index'] = range(1, len(renamed_poses)+1)
rmsd_df['cluster'] = kmeans.labels_
rmsd_df['cluster'] = rmsd_df['cluster'].astype(str)

scatter_fig = px.scatter(rmsd_df, 
                x="1", 
                y="2",
                color='cluster',
                hover_name='pose_index',
                width=1000,
                height=800,
                title='Clustering of ligand poses',
                labels={'1':'RMSD to pose 1',
                        '2':'RMSD to pose 2',}
                )
scatter_fig.add_trace(
        go.Scatter(
            x = kmeans.cluster_centers_[:,0],
            y = kmeans.cluster_centers_[:,1],
            mode='markers',
            marker=dict(color="red", 
                        symbol='x',
                        size=10),
            showlegend=False,))
scatter_fig.update_traces(marker=dict(size=12))

Screenshot 2022-11-24 at 15 30 18

@wjm41
Copy link
Owner

wjm41 commented Nov 24, 2022

On another note, would you mind if I use your code/data as an example for showing the plotting of 3D coordinates with molplotly? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants