You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You'll notice they have a final hidden cell that is full of assert statements:
This hidden cell is essentially an end-to-end test of the code.
Goal: add a similar hidden cell to the Datalab quickstart tutorial.
The hidden cell should have asserts which check that:
the jaccard similarity between data detected as is_label_issue = True and actual known mislabels in the dataset is > 0.9 (or sufficiently high threshold)
that roc_auroc_score(label_quality_scores, Z) > 0.9 (or sufficiently high threshold), where Z ground-truth array with value 1 if this data point is correctly labeled, value 0 if it is truly a mislabel. This assert checks that the label quality scores appropriately rank the data.
the jaccard similarity between data detected as is_XYZ_issue = True and actual known instances of issue XYZ in the dataset is > 0.9 (or sufficiently high threshold).
Here XYZ = outlier, near duplicate, etc.
no other issue types beyond those expected were detected in this tutorial. Make sure this assert is forwards compatible. That is, if we add 3 new issue types to Datalab-defaults in the future, this same assert should be able to catch if any of these newly added issue types is suddenly detected in this tutorial.
The text was updated successfully, but these errors were encountered:
This tutorial has no tests for some reason: https://raw.githubusercontent.com/cleanlab/cleanlab/master/docs/source/tutorials/datalab/datalab_quickstart.ipynb
If you look at at the raw version of most of our other tutorials, eg:
https://raw.githubusercontent.com/cleanlab/cleanlab/master/docs/source/tutorials/image.ipynb
You'll notice they have a final hidden cell that is full of assert statements:
This hidden cell is essentially an end-to-end test of the code.
Goal: add a similar hidden cell to the Datalab quickstart tutorial.
The hidden cell should have asserts which check that:
the jaccard similarity between data detected as is_label_issue = True and actual known mislabels in the dataset is > 0.9 (or sufficiently high threshold)
that roc_auroc_score(label_quality_scores, Z) > 0.9 (or sufficiently high threshold), where Z ground-truth array with value 1 if this data point is correctly labeled, value 0 if it is truly a mislabel. This assert checks that the label quality scores appropriately rank the data.
the jaccard similarity between data detected as is_XYZ_issue = True and actual known instances of issue XYZ in the dataset is > 0.9 (or sufficiently high threshold).
Here XYZ = outlier, near duplicate, etc.
The text was updated successfully, but these errors were encountered: