Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] test classifiers on str dtype y, ensure predict returns same type and labels #6428

Merged
merged 9 commits into from
May 22, 2024

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented May 15, 2024

This PR extends the suite test test_classifier_on_unit_test_data to test that y of object or str dtype y leads to correct labels on predict outputs.

Covers second bug in #6427, namely the example returning integer labels instead of string labels - but does not fix the bug.

It is hence expected that the test will detect bug #6427.

Depends on the following PR which fix newly covered bugs:

@fkiraly fkiraly added module:classification classification module: time series classification enhancement Adding new functionality labels May 15, 2024
@fkiraly
Copy link
Collaborator Author

fkiraly commented May 15, 2024

Recognizes BaggingClassifier and a few others, as expected.

Will now merge the fix #6430 to check that it fixes all remaining failures covered by this addition.

fkiraly added a commit that referenced this pull request May 22, 2024
…ing `y` of different type in `predict` (#6432)

Towards #6431.

Fixes `ProximityForest`, tree, stump, and `IndividualBOSS` returning `y`
of different type in `predict`.

The `IndividualBOSS` was coercing to `object` which  was turned of.

The proximity classifiers had `_predict` being copy-paste of the default
`_predict` which was fixed in
#6430. The fix is deduplication, so
the `_predict` defaults to the fixed parent classes' predict.

Tested via #6428
fkiraly added a commit that referenced this pull request May 22, 2024
…ys, even if `fit` `y` was not integer (#6430)

This fixes one of the bugs reported in
#6427, namely the default
`_predict` in `BaseClassifier` always returning integer labels, even if
the original labels were not integers.

This would cause all classifiers that did not have a custom `_predict`
implemented - a few composites, among them `BaggingClassifier` - to
always predict integers, even if the `y` seen in `fit` was of another
type.

The fix is simple, adding a missing application of the memorized
integer-to-class dictionary.

Test coverage is through #6428.
@fkiraly fkiraly merged commit 08ef4dd into main May 22, 2024
50 of 54 checks passed
@fkiraly fkiraly deleted the unittest-check-y_pred branch May 22, 2024 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant