Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for "drawing your own decision boundry" to implement machine teaching #18

Open
Hellisotherpeople opened this issue May 15, 2021 · 2 comments

Comments

@Hellisotherpeople
Copy link

Hellisotherpeople commented May 15, 2021

I love this tool. I've been using Bokeh along with UMAP/Ivis/PCA and clustering for dataset visualization like this for awhile - but I am happy to see someone automate this exact use-case since I've had to hand-roll this kind of tool for my own clustering / dimensionality reduction projects many times.

I think the logical extension to a tool like this is allowing someone to define their own decision boundary of a supervised model (they call this "machine teaching" rather than machine learning). Defining their own decision boundary should end up with them having a supervised classifier at the end and being able to visualize how that classifier operates (and ideally allowing an expert human to "tune" it). Note that this is different than the current "select aspects of the dataset by drawing" functionality built in.

One easy way to implement this is to allow the user to "draw" like you do earlier - but then making it where the user is actually drawing a "psudo-subset" (but is actually creating new data) of their initial data. Fit the classified model on this "psudo-subset", and it should end up training fast and giving the user some kind of "equation" (e.g if you choose linear models) or some other interpretation mechanism (e.g. decision trees). When the expert changes bits of how this supervised model works - the model equation or interpretation should update. No need to do CV since it's human eye-balls giving you your regularization for you.

It's a lot of work but I anticipate that if you implement it correctly you'd be well into the thousands of github stars because it's fking obvious but is a huge win in situations where say, a doctor may in fact be capable of "fixing" erroneous parts of a medical imaging AIs decision boundary.

@Hellisotherpeople Hellisotherpeople changed the title Add support for "drawing your own decision boundry" Add support for "drawing your own decision boundry" to implement machine teaching May 15, 2021
@phurwicz
Copy link
Owner

phurwicz commented May 16, 2021

Thank you and I love this feedback! Would you mind helping me understand the suggestion better?

Previously I could think of two ways of drawing decision boundaries:

  • (A) a direct way like in human-learn where the classifier literally follows the polygon (or any shape) you draw;
  • (B) an indirect way like currently in hover where you draw annotations and have a custom-architecture classifier fit to the annotations. Specifically, the active_learning recipe tries to learn the decision boundary given by the “train” set in an iterative “draw-and-retrain” process.
    • What I like about this is that one can make annotations from different views and easily combine them. The “manifold trajectory” slider of the active_learning recipe tries to interpolate between the input manifold and output manifold, giving multiple views to exploit.

Just to be sure, my point of reference is the latest version of hover (0.5.0). Let me know whether you are suggesting (A) or something else :)

@phurwicz
Copy link
Owner

phurwicz commented May 17, 2021

I think the logical extension to a tool like this is allowing someone to define their own decision boundary of a supervised model (they call this "machine teaching" rather than machine learning). Defining their own decision boundary should end up with them having a supervised classifier at the end and being able to visualize how that classifier operates (and ideally allowing an expert human to "tune" it). Note that this is different than the current "select aspects of the dataset by drawing" functionality built in.

Now that I think more about it, hover.recipes.active_learning achieves “machine teaching” through hover.core.neural.VectorNet, where one can attach “any” neural network (subject to matching dimensions with the vectorizer) after the vectorizer function.

So when starting from scratch, one can use active_learning to draw decision boundaries through annotations and (re)train.

When working an existing model which may not be VectorNet, I suggest first deciding which layers of the model to freeze and which layers to tune. Then you can convert to VectorNet by wrapping the frozen part in vectorizer component and put the tunable part in the neural net component.

  • Speaking of this, it’s worth considering to implement utility methods for converting VectorNet from/to “pure” PyTorch when applicable (i.e. when the vectorizer is essentially a preprocessor function followed by the forward() of some nn.Module).

Does this seem on the right track?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants