Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDB structure culstering #314

Open
pengzhangzhi opened this issue May 3, 2023 · 4 comments
Open

PDB structure culstering #314

pengzhangzhi opened this issue May 3, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@pengzhangzhi
Copy link

pengzhangzhi commented May 3, 2023

Hi @a-r-j and @amorehead, I would like to propose a feature: PDB structure clustering. It would be useful for structure-related tasks like structure prediction and generation. Would you be interested in this idea and want to talk about how to implement this feature? I am thinking about using foldseek for clustering and creating metadata containing the clustering information.
It would be great if you guys have any comments on this feature!

Best,
Zhangzhi

@pengzhangzhi
Copy link
Author

Hi @amorehead, I try to reach you by email but got no response. It's possible that my email has ended up in their spam folder or that you have not had the chance to respond yet.
Is there any way to reach you in private?

@a-r-j
Copy link
Owner

a-r-j commented May 3, 2023

Hi @pengzhangzhi this is actually something we planned on adding. You can read our discussion about it in #272. We decided to leave it for the initial release to see if it was something that people would want and, well, it seems like it is 😀

If you're keen to work on this I'm happy to support :)

@pengzhangzhi
Copy link
Author

Yep! Happy to help! I think the first thing is to figure out the exact features we want. I personally have a use case. I want to cluster all pdb structures into N clusters, where N can be very small like 2. In each cluster, we can further cluster them to derive representative samples. Seems like current tools like foldseek does not support that preset num of clusters.

@a-r-j
Copy link
Owner

a-r-j commented May 3, 2023

Hmm, what do you think about an approach where you use FoldSeek to get a set of representative clusters, then you can apply some hierarchical clustering method based on the inter-cluster representative structure TM scores?

@a-r-j a-r-j added the enhancement New feature or request label May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants