Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that training data is likely normalized or standardized correctly #81

Open
kwinkunks opened this issue Oct 1, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@kwinkunks
Copy link
Member

If data was split before scaling then...

Normalized: the min and max (eg 0 and 1, or -1 and +1) should be present in the training data. (Not so for test or application data.)

Standardized: the training data should be standard normal, ie pass is_standard_normal() (in particular, should have mean close to 0 and stdev of 1).

See also #6

OTOH, if scale then split...

Normalized: probably can't tell unless min or max happens to be in test split.

Standardized: For the mean, I think can check if that gets closer to 0, as long as we know how many training samples there were. For stdev, then possible to tell only by putting all the data together? Seems expensive.

@kwinkunks kwinkunks added the enhancement New feature or request label Oct 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant