Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG (RISK?)]: v2 relies on unsupported usage of scikit-learn #752

Open
JacksonBurns opened this issue Apr 2, 2024 · 3 comments
Open

[BUG (RISK?)]: v2 relies on unsupported usage of scikit-learn #752

JacksonBurns opened this issue Apr 2, 2024 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@JacksonBurns
Copy link
Member

Bug Risk

We have had a warning quietly being raised in our CI for a while now, like here, that says:

/home/runner/micromamba/envs/temp/lib/python3.11/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator StandardScaler from version 1.4.0 when using version 1.4.1.post1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
    warnings.warn(

The way we load and unload the StandardScaler object is explicitly 'unsupported and inadvisable' in their docs.

Path Forward

It's not immediately obvious what to do, though some options we have discussed online include:

  • ignoring this (tempting) and hiding it from the user, which could lead to issues with correctness or running at all
  • allow it to continue being raised, allowing the users to decide if they want to take the risk
  • elevate this to an error and prevent people from doing this
  • try and fix it (options include manually saving the means and variances and re-doing the scaling ourselves, retraining the scalers at inference time, etc. etc. )

Please chime in with your thoughts on this, v2 developers and users.

@JacksonBurns JacksonBurns added the bug Something isn't working label Apr 2, 2024
@davidegraff
Copy link
Contributor

the last option isn't too bad IMO. As-is, we "define" (air-quotes) the following scalers:

  • for each input: V, E, V_f, X_f
    essentially a list of shape $(m, 4, \ast, 2)$, where $m$ is the number of input, $4$ is the number of input scalers, $\ast$ is the dimension of the respective input, and $2$ is for the means and variances
  • one for the output: a list of shape: $(t, 2)$, where $t$ is the number of output tasks

AFAICT, the only reason we use the StandardScaler is the nice interface (it ignores nans and is object-oriented). We can easily just write our own version (like in v1) or hack it to do what we want:

n, p = 10, 4
X = np.random.rand(n, p)
scaler = StandardScaler()
scaler.mean_ = X.mean(0)
scaler.scale_ = X.std(0)

so all that would mean in practice is saving the train statistics and just setting the attributes of a new StandardScaler at test-time. Per #726, we can get output scaler serialization for free by making it "part" of the model, but figuring out where to place the input scaling info is a more open question.

@KnathanM
Copy link
Contributor

For now, I think it is fine to bump this to v2.1. It currently raises a warning, so at least users can decide for themselves in the mean time. Additionally, we see this in the CI because developers have different version locally than the github actions use with a fresh install. I think it would be good to create the checkpoints with a fresh install once v2.0 is ready and the checkpoint files won't change anymore.

@KnathanM KnathanM modified the milestones: v2.0.0, v2.1.0 Apr 12, 2024
@JacksonBurns
Copy link
Member Author

@KnathanM @kevingreenman maybe we pin this issue so it's easier to find?

@kevingreenman kevingreenman pinned this issue Apr 12, 2024
@KnathanM KnathanM self-assigned this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants