Check that training data is likely normalized or standardized correctly #81

kwinkunks · 2023-10-01T06:22:46Z

If data was split before scaling then...

Normalized: the min and max (eg 0 and 1, or -1 and +1) should be present in the training data. (Not so for test or application data.)

Standardized: the training data should be standard normal, ie pass is_standard_normal() (in particular, should have mean close to 0 and stdev of 1).

See also #6

OTOH, if scale then split...

Normalized: probably can't tell unless min or max happens to be in test split.

Standardized: For the mean, I think can check if that gets closer to 0, as long as we know how many training samples there were. For stdev, then possible to tell only by putting all the data together? Seems expensive.

The text was updated successfully, but these errors were encountered:

kwinkunks added the enhancement New feature or request label Oct 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check that training data is likely normalized or standardized correctly #81

Check that training data is likely normalized or standardized correctly #81

kwinkunks commented Oct 1, 2023

Check that training data is likely normalized or standardized correctly #81

Check that training data is likely normalized or standardized correctly #81

Comments

kwinkunks commented Oct 1, 2023