Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap Statistic and Davies-Bouldin Index #1205

Open
BradKML opened this issue Nov 5, 2021 · 17 comments
Open

Gap Statistic and Davies-Bouldin Index #1205

BradKML opened this issue Nov 5, 2021 · 17 comments
Labels
type: feature a new visualizer or utility for yb

Comments

@bbengfort
Copy link
Member

@BrandonKMLee thank you so much for contributing to Yellowbrick; I think your suggestion is a very good one! We'd be very happy to review a PR with an implementation of the algorithms. We have an internal implementation of distortion_score that is part of the KElbow visualizer, but I could see us creating a yellowbrick/cluster/metrics.py package to implement these additional cluster scorers and consolidate them with the distortion score. Are you up for taking a crack at it?

@bbengfort bbengfort added the type: feature a new visualizer or utility for yb label Nov 10, 2021
@BradKML
Copy link
Author

BradKML commented Nov 11, 2021

Would kindly do so, with reference to this picture.
2-TableI-1

@pkaf
Copy link
Contributor

pkaf commented Jan 12, 2022

I can contribute to this task. It would be great if you can point me to the script where I can update the desired metrics. I can start with a metric you can suggest.

@pkaf
Copy link
Contributor

pkaf commented Jan 13, 2022

Thanks @BrandonKMLee. Shall I do gap statistics first and move slowly through the list?

@pkaf
Copy link
Contributor

pkaf commented Jan 14, 2022

@bbengfort newbie here, apologies in advance if I don't fully understand the philosophy of yellowbrick project.

For metrics like gap statistics, or for that matter any metric with a knee, we can use existing KElbowVisualier class (with minimal code change) no?
But metrics like davies-bouldin, which are not exclusively concave or convex rather has a minima will be special case. It would need its own optimizing routine equivalent to KneeLocator.

Would it make sense to break this project into two parts
a. include all concave/convex metrics into KElbowVisualiser
b. tackle remaining individually? Ideally, code can be refactor to have a dedicated class to identify optimal number of clusters of which KneeLocator/KElbow will be a child class?

@pkaf
Copy link
Contributor

pkaf commented Jan 14, 2022

Here is a flawed solution to davies-bouldin score
Screen Shot 2022-01-14 at 2 43 48 pm

@BradKML
Copy link
Author

BradKML commented Jan 14, 2022

@pkaf if you are doing Davis-Bouldin, can I also help out with Xie-Beni, SD and S_Dbw?

@pkaf
Copy link
Contributor

pkaf commented Jan 14, 2022

Sure. Let's hear @bbengfort et al on my prior points. That will help lay out the plan.

@bbengfort
Copy link
Member

@pkaf I think you have the broad strokes correct - Yellowbrick tries to focus on visual diagnostics and model context. The important thing isn't so much the optimization mechanism for detecting the knee (though we do appreciate the kneed algorithm for making it so clear), but that we show what an optimization method would have selected in the context of other values of K.

So it seems that all concave/convex methods would be associated with KElbow for this reason and davies-bouldin would either have to have its own visualizer or we'd have to figure out how to represent it in the context of an elbow based visualization (for example, perhaps it could replace fit-time as an alternative mechanism for understanding distortion score).

@BradKML
Copy link
Author

BradKML commented Feb 12, 2022

@pkaf how did you make your visualizations, also @bbengfort it would be good to implement something similar to KElbow (the same base layout) but with a different set of indicators instead of the "elbow" (local minima or maximas for hierarchical clustering).

@bbengfort
Copy link
Member

@BrandonKMLee that makes sense to me!

@BradKML
Copy link
Author

BradKML commented Feb 18, 2022

@bbengfort Right now this idea came to mind: https://www.geeksforgeeks.org/find-indices-of-all-local-maxima-and-local-minima-in-an-array/ at the same time I wonder what the progress is for @pkaf so that the local minima detection can be used

@bbengfort
Copy link
Member

@BrandonKMLee I think that algorithm seems simple and effective; it might also be cool to use topology plots if you have a third dimension available.

@pkaf
Copy link
Contributor

pkaf commented Mar 10, 2022

not much progress on my end @BrandonKMLee. Please feel free to take it away if you or someone more keen

@BradKML
Copy link
Author

BradKML commented May 16, 2022

Sorry to ask @pkaf but can I observe the changes in https://github.com/pkaf/yellowbrick ? I can't seem to find the difference.

def findLocalMaximaMinima(n, arr):
 
    mx, mn = [], []

    if(arr[0] > arr[1]): mx.append(0)
    if(arr[0] < arr[1]): mn.append(0)

    for i in range(1, len(arr)-1):
	
        if(arr[i-1] > arr[i] < arr[i + 1]): mn.append(i)
        if(arr[i-1] < arr[i] > arr[i + 1]): mx.append(i)
 
    if(arr[-1] > arr[-2]): mx.append(len(arr)-1)
    if(arr[-1] < arr[-2]): mn.append(len(arr)-1)
 
    return mx, mn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature a new visualizer or utility for yb
Projects
None yet
Development

No branches or pull requests

3 participants