v2.0.0 - Data-centric AI Ready #238
cgnorthcutt
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
If you liked cleanlab v1.0.1, v2.0.0 will blow your mind! 💥🧠
cleanlab 2.0 adds powerful new workflows and algorithms for data-centric AI, dataset curation, auto-fixing label issues in data, learning with noisy labels, and more. Nearly every module, method, parameter, and docstring has been touched by this release.
If you're coming from 1.0, here's a migration guide.
A few highlights of new functionalities in cleanlab 2.0:
For an in-depth overview of what cleanlab 2.0 can do, check out this tutorial.
To help you get started with 2.0, we've added:
Change Log
This list is non-exhaustive! Assume every aspect of API has changed.
Module name changes or moves:
classification.LearningWithNoisyLabels
class -->classification.CleanLearning
classpruning.py
-->filter.py
latent_estimation.py
-->count.py
cifar_cnn.py
-->experimental/cifar_cnn.py
coteaching.py
-->experimental/coteaching.py
fasttext.py
-->experimental/fasttext.py
mnist_pytorch.py
-->experimental/fmnist_pytorch.py
noise_generation.py
-->benchmarking/noise_generation.py
util.py
-->internal/util.py
latent_algebra.py
-->internal/latent_algebra.py
Module Deletions:
polyplex.py
New module created:
rank.py
pruning.py
/filter.py
to heredataset.py
benchmarking.py
noise_generation.py
here.Method name changes:
pruning.get_noise_indices()
-->filter.find_label_issues()
count.num_label_errors()
-->count.num_label_issues()
Methods added:
rank.py
addsget_self_confidence_for_each_label()
get_normalized_margin_for_each_label()
filter.py
addsfilter.find_label_issues()
(select method using thefilter_by
parameter)confident_learning
, which has been shown to work very well and may become the default in the future, andpredicted_neq_given
, which is useful for benchmarking a simple baseline approach, but underperformant relative to the other filter_by methods)classification.py
addsClearnLearning.get_label_issues()
CleanLearning().fit(X, y).get_label_issues()
CleanLearning.find_label_issues()
Naming conventions changed in method names, comments, parameters, etc.
s
->labels
psx
->pred_probs
label_errors
-->label_issues
noise_mask
-->label_issues_mask
label_errors_bool
-->label_issues_mask
prune_method
-->filter_by
prob_given_label
-->self_confidence
pruning
-->filtering
Parameter re-ordering:
labels
,pred_probs
) parameters to be consistent (in that order) in all methods.frac_noise
) in filter.find_label_issues()Parameter changes:
order_label_issues()
sorted_index_method
-->rank_by
find_label_issues()
sorted_index_method
-->return_indices_ranked_by
prune_method
-->filter_by
Global variables changed:
filter.py
MIN_NUM_PER_CLASS = 5
-->MIN_NUM_PER_CLASS = 1
Dependencies added
Way-too-detailed Change Log
New Contributors
Full Changelog: v1.0.1...v2.0.0
This discussion was created from the release v2.0.0 - Data-centric AI Ready.
Beta Was this translation helpful? Give feedback.
All reactions