Skip to content

Project Report submitted in partial fulfillment of Text Mining Course(732A92)

Notifications You must be signed in to change notification settings

S-B-Iqbal/Bayesian-Evaluation-of-Text-Classification-Models

Repository files navigation

Bayesian-Evaluation-of-Text-Classification-Models

When evaluating text classification models, we want to be certain about the performance of a model as well as its superiority over another. In the area of text classification it has become a norm to apply Null Hypothesis Significance Test(NHST) to statistically state and compare classifier performance. But, a frequentist approach has its own limitations and fallacies. In this report, we reflect on limitations posed by NHST. We also implement a novel Bayesian approach for evaluating text-classification models. We use a benchmark dataset and create several shallow models consisting of sparse and dense features and also an attention-based model for comparison. We empirically demonstrate the difference between the two evaluation approaches.

Project


Notes


  • PyTorch indexing was different from Sk-learn's indexing. In order to compare output of pytorch model with sklearn's output, we need to reset the index:
# Example
sklearn.metrics.f1_score(ytest[ytest_bert_idx,:], ytest_pred_bert, average='micro', sample_weight=None, zero_division='warn')
  • For NHST, the bootstrap sampling was not optimized, it can take a while to create 10000 bootstrap samples for each case!

Datasets


All the model output are provided in Data folder.

To obtain feature-matrix from DitilBERT model, please refer the section "Creating BERT based features" in the Shallow_Models.ipynb.

Report


About

Project Report submitted in partial fulfillment of Text Mining Course(732A92)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published