Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Enable bias correction in AdamW when fine-tuning BERT #1468

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Jan 7, 2021

This should improve stability.

Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020).

Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020)

Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020).

Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020).
@leezu leezu requested a review from a team as a code owner January 7, 2021 23:14
@sxjscience
Copy link
Member

Let's try to rerun the training with the batch script here: https://github.com/dmlc/gluon-nlp/tree/master/tools/batch#squad-training

Basically, we just need to run the following two for SQuAD 2.0 and 1.1

# AWS Batch training with horovod on SQuAD 2.0 + FP16
bash question_answering/run_batch_squad.sh 1 2.0 submit_squad_v2_horovod_fp16.log float16

# AWS Batch training with horovod on SQuAD 1.1 + FP16
bash question_answering/run_batch_squad.sh 1 1.1 submit_squad_v1_horovod_fp16.log float16

@codecov
Copy link

codecov bot commented Jan 7, 2021

Codecov Report

Merging #1468 (52ce2a4) into master (def0d70) will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1468      +/-   ##
==========================================
- Coverage   85.86%   85.84%   -0.02%     
==========================================
  Files          52       52              
  Lines        6911     6911              
==========================================
- Hits         5934     5933       -1     
- Misses        977      978       +1     
Impacted Files Coverage Δ
src/gluonnlp/data/tokenizers/yttm.py 81.89% <0.00%> (-0.87%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update def0d70...52ce2a4. Read the comment docs.

@github-actions
Copy link

github-actions bot commented Jan 7, 2021

@leezu
Copy link
Contributor Author

leezu commented Jan 8, 2021

test_squad2_albert_base 8903644b-13e1-4aa4-b695-e7b5f2c50c7d
test_squad2_albert_large aac428ac-4e25-48e8-8f3e-2643cbb6b95e
test_squad2_albert_xlarge bb565663-8173-45aa-9489-2dd690fd24c4
test_squad2_albert_xxlarge 38d9929c-fea2-4648-bc68-0bd4eb491ee8
test_squad2_electra_base 0eb9090a-d86b-40a6-9f1c-61e1cf034b59
test_squad2_electra_large 43fabf48-b524-499f-9d8a-2113349dcf74
test_squad2_electra_small 5c631945-ad26-4c2f-a7d3-bb8c705023a2
test_squad2_roberta_large 96d1e46f-b292-4915-a867-c724bb082585
test_squad2_uncased_bert_base 8228dd4c-27d3-4118-b682-06332db980f2
test_squad2_uncased_bert_large 22a91f7c-707e-4adf-a3d9-71286a3e165e
test_squad2_gluon_en_cased_bert_base_v1 13d38ddd-4ab6-4e60-8cae-1400d3169d4c
test_squad2_mobilebert 5377ebdc-da03-4e4e-8546-43e83643d1c0
test_squad2_albert_base c71abbd1-9ddb-465a-83a8-a257994a47a4
test_squad2_albert_large 55a10c2f-b51e-4722-b8fe-d0154ccf1124
test_squad2_albert_xlarge d3b1e954-b22e-4b30-bc3a-db3303d8de85
test_squad2_albert_xxlarge 9d8c599c-ecf2-4815-ac3c-cc853c75cddd
test_squad2_electra_base 9c10fca5-0ac6-4ec8-91ce-ebf2e0593513
test_squad2_electra_large d844645c-d56b-4549-805e-a3558d777e75
test_squad2_electra_small 8b17bb3f-ee8e-4212-92d7-59155f0c54ef
test_squad2_roberta_large e9972888-ae53-41e0-9b8f-1db8359e68c9
test_squad2_uncased_bert_base 083c431c-6e02-4a67-ab92-1e84a450df52
test_squad2_uncased_bert_large 24d40d9e-06fd-4158-90a3-1ee5da7183c1
test_squad2_gluon_en_cased_bert_base_v1 6b2c015b-5829-40b6-9435-718d3ecf46de
test_squad2_mobilebert 08e7618c-7e19-4db2-9451-09f65729272e

@sxjscience
Copy link
Member

Yes, you can later use the following script to sync up the results.

bash question_answering/sync_batch_result.sh submit_squad_v2_horovod_fp16.log squad_v2_horovod_fp16
bash question_answering/sync_batch_result.sh submit_squad_v1_horovod_fp16.log squad_v1_horovod_fp16

After all results (part of the results) have been finished, you can parse the logs via

python3 question_answering/parse_squad_results.py --dir squad_v2_horovod_fp32

@leezu
Copy link
Contributor Author

leezu commented Jan 8, 2021

% python3 question_answering/parse_squad_results.py --dir squad_v2_horovod_fp16                                                                  1m 37s ~/src/gluon-nlp/tools/batch master ip-10-20-11-150
                           name    best_f1    best_em  best_f1_thresh  best_em_thresh  time_spent_in_hours
0                   albert_base  81.861255  79.112272       -1.671970       -1.742718             1.139900
1                  albert_large  84.904438  81.900109       -1.086745       -1.086745             3.423180
2                 albert_xlarge  88.032327  85.134338       -1.625434       -1.625434             5.967083
3                albert_xxlarge  90.085053  87.155731       -2.226489       -2.226489            11.294118
4                  electra_base  86.282903  83.643561       -1.848169       -2.301743             1.250153
5                 electra_large  90.871907  88.461215       -1.347744       -1.347744             3.140608
6                 electra_small  73.878219  71.481513       -1.548537       -1.548537             0.383728
7   gluon_en_cased_bert_base_v1  77.620289  74.757854       -1.731051       -1.731051             1.595762
8                    mobilebert        NaN        NaN             NaN             NaN                  NaN
9                 roberta_large  89.239196  86.431399       -2.168329       -2.168329             4.119268
10            uncased_bert_base  75.539014  72.702771       -1.595349       -1.850638             1.540320
11           uncased_bert_large  81.322878  78.177377       -2.056313       -2.056739             4.103469
Saving to squad_v2_horovod_fp16.csv

% python3 question_answering/parse_squad_results.py --dir squad_v1_horovod_fp16                                                                         ~/src/gluon-nlp/tools/batch master ip-10-20-11-150
                           name    best_f1    best_em  best_f1_thresh  best_em_thresh  time_spent_in_hours
0                   albert_base  90.605130  83.964049             NaN             NaN             0.745851
1                  albert_large  92.574139  86.385998             NaN             NaN             2.319241
2                 albert_xlarge  93.836504  87.984863             NaN             NaN             4.367765
3                albert_xxlarge  94.569074  88.448439             NaN             NaN             7.321531
4                  electra_base  92.483534  86.821192             NaN             NaN             0.882092
5                 electra_large  94.824761  89.631031             NaN             NaN             2.216832
6                 electra_small  85.263124  78.893094             NaN             NaN             0.267190
7   gluon_en_cased_bert_base_v1  88.685434  81.986755             NaN             NaN             1.077892
8                    mobilebert        NaN        NaN             NaN             NaN                  NaN
9                 roberta_large  94.665818  89.101230             NaN             NaN             2.790591
10            uncased_bert_base  88.103126  81.201514             NaN             NaN             0.979201
11           uncased_bert_large  90.691656  83.945128             NaN             NaN             2.756076
Saving to squad_v1_horovod_fp16.csv

Is there any known issue with Mobilebert? I

@leezu
Copy link
Contributor Author

leezu commented Jan 8, 2021

Looks like an AMP issue or an operator issue causing AMP to continue decreasing the scale.. finetune_squad2.0.log

@sxjscience
Copy link
Member

sxjscience commented Jan 8, 2021 via email

@sxjscience
Copy link
Member

From the figure, I think the performance looks similar. If we choose to update the flags, we can upload the pretrained weights to S3 and also change the numbers in https://github.com/dmlc/gluon-nlp/tree/master/scripts/question_answering.

Copy link
Member

@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. We will also need to update the results table.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants