Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation questions #605

Open
mdagost opened this issue Jul 21, 2023 · 1 comment
Open

Validation questions #605

mdagost opened this issue Jul 21, 2023 · 1 comment

Comments

@mdagost
Copy link

mdagost commented Jul 21, 2023

I'm using both relative_validity_ and the full validity_index function from hdbscan.validity. @lmcinnes if they give different optimal parameters, is there a reason to prefer one over the other? Perhaps validity_index because the other one is approximate?

My application is in NLP clustering of embedding vectors, and one of the things I'm testing are different embedding vectors with different dimensionalities. Is it valid to use either of those metrics to compare across embeddings for the same dataset, or only across the hdbscan parameters themselves?

Thank you so much!

@mdagost
Copy link
Author

mdagost commented Aug 2, 2023

Just thought I'd bump this if you have any thoughts, out of the kindness of your heart @lmcinnes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant