Skip to content

Cleanlab on NER Data #256

Answered by jwmueller
Giriteja asked this question in Q&A
May 11, 2022 · 2 comments · 3 replies
Discussion options

You must be logged in to vote

Here's outline of how existing Cleanlab v2.0.0 repo can be used for token classification tasks.

Basic Idea: Treat the labels and model’s predictions at each token as if they were labels & predictions for independent training examples (ignoring which document each token/label comes from). Then just run regular cleanlab as if this were a multiclass classification task (with each document broken up into many separate examples, one for each token). So when running cleanlab's find_label_issues(labels, pred_probs) and get_label_quality_scores(labels, pred_probs), the labels should be for each token in your entire corpus, and the pred_probs should be the corresponding class-probabilities estimat…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@jwmueller
Comment options

@Giriteja
Comment options

@jwmueller
Comment options

Answer selected by jwmueller
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants