Validation questions #605

mdagost · 2023-07-21T14:37:54Z

I'm using both relative_validity_ and the full validity_index function from hdbscan.validity. @lmcinnes if they give different optimal parameters, is there a reason to prefer one over the other? Perhaps validity_index because the other one is approximate?

My application is in NLP clustering of embedding vectors, and one of the things I'm testing are different embedding vectors with different dimensionalities. Is it valid to use either of those metrics to compare across embeddings for the same dataset, or only across the hdbscan parameters themselves?

Thank you so much!

The text was updated successfully, but these errors were encountered:

mdagost · 2023-08-02T22:31:24Z

Just thought I'd bump this if you have any thoughts, out of the kindness of your heart @lmcinnes :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation questions #605

Validation questions #605

mdagost commented Jul 21, 2023

mdagost commented Aug 2, 2023

Validation questions #605

Validation questions #605

Comments

mdagost commented Jul 21, 2023

mdagost commented Aug 2, 2023