Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adds support for Pipelines with ModelVisualizers #955

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

[WIP] Adds support for Pipelines with ModelVisualizers #955

wants to merge 3 commits into from

Conversation

bbengfort
Copy link
Member

@bbengfort bbengfort commented Aug 27, 2019

This PR fixes #498 and adds support for Pipelines in most ScoreVisualizers. It also adds some automatic checking extending functionality described in #180.

I have made the following changes:

  1. Added an is_pipeline type check
  2. Added functionality to ModelVisualizer to access the final estimator in a pipeline

TODOs and questions

Still to do:

  • modify visualizers to use new code
  • create tests for pipelines with model visualizers
  • document pipelines in visualizers

CHECKLIST

  • Is the commit message formatted correctly?
  • Have you noted the new functionality/bugfix in the release notes of the next release?
  • Included a sample plot to visually illustrate your changes?
  • Do all of your functions and methods have docstrings?
  • Have you added/updated unit tests where appropriate?
  • Have you updated the baseline images if necessary?
  • Have you run the unit tests using pytest?
  • Is your code style correct (are you using PEP8, pyflakes)?
  • Have you documented your new feature/functionality in the docs?
  • Have you built the docs using make html?

try:
return getattr(self._final_estimator(), attr)
except AttributeError as e:
raise NotFitted(str(e))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me that this would raise not fitted but I don't like the message "object has no attribute coef_" ... not sure if it would be better to just raise the Attribute Error or not.

from sklearn.datasets import make_blobs, make_classification, make_regression


BASES = [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file takes the work from the audit and adds systematic checks to be tested across different groups of visualizers -- it replaces the old tests/checks.py framework by using pytest do perform all the checks.


@pytest.mark.skip("too many edge cases, is tested in most visualizer-specific tests")
@pytest.mark.parametrize("Viz", VISUALIZERS)
def test_fit(Viz):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this might be an easy one to implement, but there are a lot of edge cases here, so I'm thinking about removing it.

assert oz.fit(data.X.train, data.y.train) is oz


@pytest.mark.xfail(reason="quick methods aren't primetime yet")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check also tells us a lot about the state of our quick methods; only 5 xpass (e.g. pass even though the test is marked xfail)

##########################################################################


class TestModelVisualizer(VisualTestCase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds tests for the final estimator functionality implemented for Pipelines.

@@ -17,18 +17,23 @@
## Imports
##########################################################################
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensures all feature visualizers are imported into the top level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, all of the modifications to __init__.py files are to ensure that the visualizers and bases are imported at the top level so they can be imported into the check functionality.



# Alias for closer name to isinstance and issubclass
ispipeline = is_pipeline
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only checks Pipeline objects, I'm not sure if we also need to check FeatureUnion - am open for suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow ModelVisualizers to wrap Pipeline objects
1 participant