Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of RFECV visualizer #1048

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Improve performance of RFECV visualizer #1048

wants to merge 2 commits into from

Conversation

bbengfort
Copy link
Member

This PR fixes #1047 which reported that yellowbrick's internal implementation of RFECV is much slower than scikit-learn's. This PR introduces a new implementation of RFECV that is closer to the latest version of sklearn.RFECV.

I have made the following changes:

  1. Updated RFECV with new parameters to match the sklearn implementation
  2. Implemented an _RFECV that subclasses the sklearn implementation and adds our required functionality

Sample Code and Plot

If you are adding or modifying a visualizer, PLEASE include a sample plot here along with the code you used to generate it.

TODOs and questions

Still to do:

  • Update the docstrings and parameters of all classes, functions, and quick methods
  • Fix the still failing tests
  • Determine if this is the best path forward and reduces the most technical debt
  • See "TODO" and "HACK" comments in the code

Questions for the @DistrictDataLabs/team-oz-maintainers:

  • Does this require a new minimum version of scikit-learn?
  • Does this require us to add joblib as an optional dependency?
  • Is this maintainable in the long run?

@lwgray @rebeccabilbro and @jc639 -- it would be great if we could work on this together

CHECKLIST

  • Is the commit message formatted correctly?
  • Have you noted the new functionality/bugfix in the release notes of the next release?
  • Included a sample plot to visually illustrate your changes?
  • Do all of your functions and methods have docstrings?
  • Have you added/updated unit tests where appropriate?
  • Have you updated the baseline images if necessary?
  • Have you run the unit tests using pytest?
  • Is your code style correct (are you using PEP8, pyflakes)?
  • Have you documented your new feature/functionality in the docs?
  • Have you built the docs using make html?

@rebeccabilbro rebeccabilbro self-requested a review June 11, 2020 16:12
@fab375
Copy link

fab375 commented Dec 23, 2020

Hi, is there are working solution for #1047, that accelerates the RFECV to reach similar performance to the Sklearn implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFECV is much slower than Sklearn's implementation
3 participants