Enable bias correction in AdamW when fine-tuning BERT #1468

leezu · 2021-01-07T23:14:13Z

This should improve stability.

Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020).

Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020)

Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020). Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020).

sxjscience · 2021-01-07T23:16:01Z

Let's try to rerun the training with the batch script here: https://github.com/dmlc/gluon-nlp/tree/master/tools/batch#squad-training

Basically, we just need to run the following two for SQuAD 2.0 and 1.1

# AWS Batch training with horovod on SQuAD 2.0 + FP16
bash question_answering/run_batch_squad.sh 1 2.0 submit_squad_v2_horovod_fp16.log float16

# AWS Batch training with horovod on SQuAD 1.1 + FP16
bash question_answering/run_batch_squad.sh 1 1.1 submit_squad_v1_horovod_fp16.log float16

codecov · 2021-01-07T23:26:33Z

Codecov Report

Merging #1468 (52ce2a4) into master (def0d70) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1468      +/-   ##
==========================================
- Coverage   85.86%   85.84%   -0.02%     
==========================================
  Files          52       52              
  Lines        6911     6911              
==========================================
- Hits         5934     5933       -1     
- Misses        977      978       +1

Impacted Files	Coverage Δ
src/gluonnlp/data/tokenizers/yttm.py	`81.89% <0.00%> (-0.87%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update def0d70...52ce2a4. Read the comment docs.

github-actions · 2021-01-07T23:36:15Z

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1468/bertbiascorrection/index.html

leezu · 2021-01-08T00:01:26Z

test_squad2_albert_base 8903644b-13e1-4aa4-b695-e7b5f2c50c7d
test_squad2_albert_large aac428ac-4e25-48e8-8f3e-2643cbb6b95e
test_squad2_albert_xlarge bb565663-8173-45aa-9489-2dd690fd24c4
test_squad2_albert_xxlarge 38d9929c-fea2-4648-bc68-0bd4eb491ee8
test_squad2_electra_base 0eb9090a-d86b-40a6-9f1c-61e1cf034b59
test_squad2_electra_large 43fabf48-b524-499f-9d8a-2113349dcf74
test_squad2_electra_small 5c631945-ad26-4c2f-a7d3-bb8c705023a2
test_squad2_roberta_large 96d1e46f-b292-4915-a867-c724bb082585
test_squad2_uncased_bert_base 8228dd4c-27d3-4118-b682-06332db980f2
test_squad2_uncased_bert_large 22a91f7c-707e-4adf-a3d9-71286a3e165e
test_squad2_gluon_en_cased_bert_base_v1 13d38ddd-4ab6-4e60-8cae-1400d3169d4c
test_squad2_mobilebert 5377ebdc-da03-4e4e-8546-43e83643d1c0
test_squad2_albert_base c71abbd1-9ddb-465a-83a8-a257994a47a4
test_squad2_albert_large 55a10c2f-b51e-4722-b8fe-d0154ccf1124
test_squad2_albert_xlarge d3b1e954-b22e-4b30-bc3a-db3303d8de85
test_squad2_albert_xxlarge 9d8c599c-ecf2-4815-ac3c-cc853c75cddd
test_squad2_electra_base 9c10fca5-0ac6-4ec8-91ce-ebf2e0593513
test_squad2_electra_large d844645c-d56b-4549-805e-a3558d777e75
test_squad2_electra_small 8b17bb3f-ee8e-4212-92d7-59155f0c54ef
test_squad2_roberta_large e9972888-ae53-41e0-9b8f-1db8359e68c9
test_squad2_uncased_bert_base 083c431c-6e02-4a67-ab92-1e84a450df52
test_squad2_uncased_bert_large 24d40d9e-06fd-4158-90a3-1ee5da7183c1
test_squad2_gluon_en_cased_bert_base_v1 6b2c015b-5829-40b6-9435-718d3ecf46de
test_squad2_mobilebert 08e7618c-7e19-4db2-9451-09f65729272e

sxjscience · 2021-01-08T00:04:39Z

Yes, you can later use the following script to sync up the results.

bash question_answering/sync_batch_result.sh submit_squad_v2_horovod_fp16.log squad_v2_horovod_fp16
bash question_answering/sync_batch_result.sh submit_squad_v1_horovod_fp16.log squad_v1_horovod_fp16

After all results (part of the results) have been finished, you can parse the logs via

python3 question_answering/parse_squad_results.py --dir squad_v2_horovod_fp32

leezu · 2021-01-08T14:09:27Z

% python3 question_answering/parse_squad_results.py --dir squad_v2_horovod_fp16                                                                  1m 37s ~/src/gluon-nlp/tools/batch master ip-10-20-11-150
                           name    best_f1    best_em  best_f1_thresh  best_em_thresh  time_spent_in_hours
0                   albert_base  81.861255  79.112272       -1.671970       -1.742718             1.139900
1                  albert_large  84.904438  81.900109       -1.086745       -1.086745             3.423180
2                 albert_xlarge  88.032327  85.134338       -1.625434       -1.625434             5.967083
3                albert_xxlarge  90.085053  87.155731       -2.226489       -2.226489            11.294118
4                  electra_base  86.282903  83.643561       -1.848169       -2.301743             1.250153
5                 electra_large  90.871907  88.461215       -1.347744       -1.347744             3.140608
6                 electra_small  73.878219  71.481513       -1.548537       -1.548537             0.383728
7   gluon_en_cased_bert_base_v1  77.620289  74.757854       -1.731051       -1.731051             1.595762
8                    mobilebert        NaN        NaN             NaN             NaN                  NaN
9                 roberta_large  89.239196  86.431399       -2.168329       -2.168329             4.119268
10            uncased_bert_base  75.539014  72.702771       -1.595349       -1.850638             1.540320
11           uncased_bert_large  81.322878  78.177377       -2.056313       -2.056739             4.103469
Saving to squad_v2_horovod_fp16.csv

% python3 question_answering/parse_squad_results.py --dir squad_v1_horovod_fp16                                                                         ~/src/gluon-nlp/tools/batch master ip-10-20-11-150
                           name    best_f1    best_em  best_f1_thresh  best_em_thresh  time_spent_in_hours
0                   albert_base  90.605130  83.964049             NaN             NaN             0.745851
1                  albert_large  92.574139  86.385998             NaN             NaN             2.319241
2                 albert_xlarge  93.836504  87.984863             NaN             NaN             4.367765
3                albert_xxlarge  94.569074  88.448439             NaN             NaN             7.321531
4                  electra_base  92.483534  86.821192             NaN             NaN             0.882092
5                 electra_large  94.824761  89.631031             NaN             NaN             2.216832
6                 electra_small  85.263124  78.893094             NaN             NaN             0.267190
7   gluon_en_cased_bert_base_v1  88.685434  81.986755             NaN             NaN             1.077892
8                    mobilebert        NaN        NaN             NaN             NaN                  NaN
9                 roberta_large  94.665818  89.101230             NaN             NaN             2.790591
10            uncased_bert_base  88.103126  81.201514             NaN             NaN             0.979201
11           uncased_bert_large  90.691656  83.945128             NaN             NaN             2.756076
Saving to squad_v1_horovod_fp16.csv

Is there any known issue with Mobilebert? I

leezu · 2021-01-08T14:11:37Z

Looks like an AMP issue or an operator issue causing AMP to continue decreasing the scale.. finetune_squad2.0.log

sxjscience · 2021-01-08T15:59:31Z

Yes. Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Leonard Lausen <notifications@github.com> Sent: Friday, January 8, 2021 6:11:53 AM To: dmlc/gluon-nlp <gluon-nlp@noreply.github.com> Cc: Xingjian SHI <xshiab@connect.ust.hk>; Review requested <review_requested@noreply.github.com> Subject: Re: [dmlc/gluon-nlp] Enable bias correction in AdamW when fine-tuning BERT (#1468) Looks like an AMP issue? finetune_squad2.0.log<https://github.com/dmlc/gluon-nlp/files/5787622/finetune_squad2.0.log> — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub<#1468 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHQH3SCBUACNO3HK254UOTSY4HCTANCNFSM4VZVBFMA>.

sxjscience · 2021-01-08T16:47:10Z

From the figure, I think the performance looks similar. If we choose to update the flags, we can upload the pretrained weights to S3 and also change the numbers in https://github.com/dmlc/gluon-nlp/tree/master/scripts/question_answering.

sxjscience

LGTM in general. We will also need to update the results table.

leezu requested a review from a team as a code owner January 7, 2021 23:14

sxjscience approved these changes Jan 10, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable bias correction in AdamW when fine-tuning BERT #1468

Enable bias correction in AdamW when fine-tuning BERT #1468

leezu commented Jan 7, 2021

sxjscience commented Jan 7, 2021

codecov bot commented Jan 7, 2021 •

edited

github-actions bot commented Jan 7, 2021

leezu commented Jan 8, 2021

sxjscience commented Jan 8, 2021

leezu commented Jan 8, 2021

leezu commented Jan 8, 2021 •

edited

sxjscience commented Jan 8, 2021 via email

sxjscience commented Jan 8, 2021

sxjscience left a comment

Enable bias correction in AdamW when fine-tuning BERT #1468

Are you sure you want to change the base?

Enable bias correction in AdamW when fine-tuning BERT #1468

Conversation

leezu commented Jan 7, 2021

sxjscience commented Jan 7, 2021

codecov bot commented Jan 7, 2021 • edited

Codecov Report

github-actions bot commented Jan 7, 2021

leezu commented Jan 8, 2021

sxjscience commented Jan 8, 2021

leezu commented Jan 8, 2021

leezu commented Jan 8, 2021 • edited

sxjscience commented Jan 8, 2021 via email

sxjscience commented Jan 8, 2021

sxjscience left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 7, 2021 •

edited

leezu commented Jan 8, 2021 •

edited