Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Allows Silhouette Visualizer to accept DensityEstimator #1304

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

lwgray
Copy link
Contributor

@lwgray lwgray commented Jun 24, 2023

This PR fixes #1303, which reported that they could not use GMM as a clustering model with Silhouette Visualizer.
They received this traceback:
yellowbrick.exceptions.YellowbrickTypeError: The supplied model is not a clustering estimator; try a classifier or regression score visualizer instead!

Once I resolved the above issue, I encountered another problem with GMM not having a n_clusters attribute on the estimator.

I have made the following changes:

  1. I added a new is_density function to the utils/types file
  2. I then used the is_density function with the ClusteringScoreVisualizer Class to allow for DensityEstimators to be used by this class
    3. I fixed the attribute error by using a try/except clause to setself.n_clusters_ equal to self.estimator.n_components in silhouette.py file
  3. Checked if self.estimator has the n_components attribute that the Density Estimator possesses and set self.n_clusters_ to self.estimator.n_components

Sample Code

from sklearn.mixture import GaussianMixture as GMM

from yellowbrick.cluster import SilhouetteVisualizer
from sklearn.datasets import make_blobs

X, y = make_blobs(
n_samples=1000, n_features=12, centers=5, shuffle=False, random_state=0
)

Instantiate the clustering model and visualizer
model = GMM(n_components = 5, random_state=0)
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')

visualizer.fit(X) # Fit the data to the visualizer
visualizer.show() # Finalize and render the figure

PLOT

image

Questions for the @DistrictDataLabs/team-oz-maintainers:

  • Is the try/except clause a viable solution for missing attributes? I foresee this being an issue because I came across a different attribute error with a different clustering estimator. This could get unwieldy.
  • [ ]

CHECKLIST

@lwgray lwgray requested a review from bbengfort June 24, 2023 22:40
@codecov
Copy link

codecov bot commented Jun 24, 2023

Codecov Report

Merging #1304 (78c8c6a) into develop (f7a8e95) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop    #1304   +/-   ##
========================================
  Coverage    90.70%   90.71%           
========================================
  Files           93       93           
  Lines         5327     5332    +5     
========================================
+ Hits          4832     4837    +5     
  Misses         495      495           
Files Changed Coverage Δ
yellowbrick/cluster/base.py 100.00% <100.00%> (ø)
yellowbrick/cluster/silhouette.py 85.55% <100.00%> (+0.32%) ⬆️
yellowbrick/utils/types.py 92.15% <100.00%> (+0.49%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lwgray lwgray changed the title Allows Silhouette Visualizer to accept DensityEstimator WIP: Allows Silhouette Visualizer to accept DensityEstimator Jun 25, 2023
@lwgray
Copy link
Contributor Author

lwgray commented Jun 25, 2023

@bbengfort Please hold off approving this PR because the fix I added here is already fixed more logically in #1294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to use Silhouette Visualizer with Gaussian Mixture Model
1 participant