Can't reproduce pretraining results for Wav2vec2 using LibriSpeech recipe #2512

GasserElbanna · 2024-04-16T22:15:39Z

Describe the bug

Hello, I am pretraining Wav2vec2 following the instructions on this page. The pretraining went very smoothly (thank you for that!!), however, when I compared the training logs with the one published here, I found that my model finished the 400K steps in only 25 epochs (used 8 A100 gpus) with lower accuray (~60%) as opposed to 700 epochs and accuracy of around 68% as in the example checkpoint. Also, my training finished within 2 days only which is confusing.

Expected behaviour

I expect similar training performance when looking at the training logs.

To Reproduce

Below is the training log for my model which is different from the example one here:

epoch: 1, steps: 18223, lr: 3.04e-04 - train loss: 2.13e+04 - valid loss: 2.37e+03, valid accuracy: 0.35966309905052185
epoch: 2, steps: 36446, lr: 4.51e-04 - train loss: 1.66e+04 - valid loss: 2.07e+03, valid accuracy: 0.4265761077404022
epoch: 3, steps: 54669, lr: 4.28e-04 - train loss: 1.53e+04 - valid loss: 1.91e+03, valid accuracy: 0.46357864141464233
epoch: 4, steps: 72892, lr: 4.05e-04 - train loss: 1.47e+04 - valid loss: 1.83e+03, valid accuracy: 0.485844224691391
epoch: 5, steps: 91115, lr: 3.83e-04 - train loss: 1.43e+04 - valid loss: 1.77e+03, valid accuracy: 0.4983205795288086
epoch: 6, steps: 109338, lr: 3.60e-04 - train loss: 1.39e+04 - valid loss: 1.73e+03, valid accuracy: 0.5098890066146851
epoch: 7, steps: 127561, lr: 3.38e-04 - train loss: 1.36e+04 - valid loss: 1.68e+03, valid accuracy: 0.5208209753036499
epoch: 8, steps: 145784, lr: 3.15e-04 - train loss: 1.33e+04 - valid loss: 1.63e+03, valid accuracy: 0.5316717028617859
epoch: 9, steps: 164007, lr: 2.93e-04 - train loss: 1.30e+04 - valid loss: 1.59e+03, valid accuracy: 0.5391503572463989
epoch: 10, steps: 182230, lr: 2.70e-04 - train loss: 1.26e+04 - valid loss: 1.56e+03, valid accuracy: 0.5474251508712769
epoch: 11, steps: 200453, lr: 2.47e-04 - train loss: 1.23e+04 - valid loss: 1.52e+03, valid accuracy: 0.5530011653900146
epoch: 12, steps: 218676, lr: 2.25e-04 - train loss: 1.21e+04 - valid loss: 1.49e+03, valid accuracy: 0.5636028051376343
epoch: 13, steps: 236899, lr: 2.02e-04 - train loss: 1.18e+04 - valid loss: 1.46e+03, valid accuracy: 0.5687620639801025
epoch: 14, steps: 255122, lr: 1.80e-04 - train loss: 1.17e+04 - valid loss: 1.45e+03, valid accuracy: 0.5734390020370483
epoch: 15, steps: 273345, lr: 1.57e-04 - train loss: 1.15e+04 - valid loss: 1.43e+03, valid accuracy: 0.5788331031799316
epoch: 16, steps: 291568, lr: 1.34e-04 - train loss: 1.13e+04 - valid loss: 1.42e+03, valid accuracy: 0.5822480916976929
epoch: 17, steps: 309791, lr: 1.12e-04 - train loss: 1.12e+04 - valid loss: 1.40e+03, valid accuracy: 0.586252748966217
epoch: 18, steps: 328014, lr: 8.92e-05 - train loss: 1.11e+04 - valid loss: 1.39e+03, valid accuracy: 0.5907050967216492
epoch: 19, steps: 346237, lr: 6.66e-05 - train loss: 1.10e+04 - valid loss: 1.37e+03, valid accuracy: 0.596407413482666
epoch: 20, steps: 364460, lr: 4.41e-05 - train loss: 1.08e+04 - valid loss: 1.36e+03, valid accuracy: 0.5983026623725891
epoch: 21, steps: 382683, lr: 2.15e-05 - train loss: 1.08e+04 - valid loss: 1.34e+03, valid accuracy: 0.6026105880737305
epoch: 22, steps: 400000, lr: 0.00e+00 - train loss: 1.07e+04 - valid loss: 1.34e+03, valid accuracy: 0.6060941815376282
epoch: 23, steps: 400000, lr: 0.00e+00 - train loss: 0.00e+00 - valid loss: 1.33e+03, valid accuracy: 0.6060227155685425
epoch: 24, steps: 400000, lr: 0.00e+00 - train loss: 0.00e+00 - valid loss: 1.33e+03, valid accuracy: 0.6051703691482544
epoch: 25, steps: 400000, lr: 0.00e+00 - train loss: 0.00e+00 - valid loss: 1.33e+03, valid accuracy: 0.6063333749771118

Environment Details

I am using python 3.11 and speechbrain 1.0

Relevant Log Output

No response

Additional Context

No response

Adel-Moumen · 2024-04-19T12:25:28Z

Hello @GasserElbanna, thanks a lot for opening this issue!

Could you please @TParcollet and/or @salah-zaiem have a look? Thanks a lot :)

TParcollet · 2024-04-19T14:29:25Z

Hi, it's important that the total batch size corresponds to roughly 1.6h. By changing the gradient accumulation factor your can adjust this.

GasserElbanna · 2024-04-19T14:52:41Z

Hello, thank you for the quick response. I used the default config file for pre-training. So, I am assuming these are the parameters below I need to adjust?

Dynamic Batching parameters:
max_batch_length: 200 # Fits in a 32GB GPUs (V100)
num_buckets: 70
shuffle: True # if true re-creates batches at each epoch shuffling examples.
batch_ordering: random

TParcollet · 2024-04-20T07:34:41Z

@Adel-Moumen i see that the gradient accumulation factor is missing on this recipe. Could you add it? (No need to PR imho push directly to develop).

@GasserElbanna have a look at any other yaml for asr in the libri folder, you will find the gradient accumulation factor param. Just copy and past it in this yaml, anywhere. Then play with grad accum / max batch len to make sure that you have 1.2-1.6h of speech per batch. Grad_accum * max_batch_len * nb gpu = 1.6h.

Also, your A100 must certainly be able to accommodate more than 200s.

Adel-Moumen · 2024-04-20T08:18:47Z

@Adel-Moumen i see that the gradient accumulation factor is missing on this recipe. Could you add it? (No need to PR imho push directly to develop).

Why would it be missing? By default, grad_accumulation_factor is set to 1 (see: https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/core.py#L84). The var is called in each fit_batch call (see: https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/core.py#L1199). As grad_accumulation_factor can also be set through a flag (see: https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/core.py#L422-L426) the recipe is technically not missing from this feature. You just need to play with --grad_accumulation_factor=N where N is the grad acc steps.

GasserElbanna · 2024-04-20T23:23:40Z

Hi, thanks @TParcollet for the explanation, it's clearer now.
Thanks @Adel-Moumen for pointing out the flag.

I am currently pretraining with --grad_accumulation_factor=2 and max_batch_length=400 on 8 gpus yielding 2 * 400 * 8 = 6400 (~1.8h).

Here's the logs for the first epoch:
epoch: 1, steps: 4611, lr: 7.68e-05 - train loss: 4.84e+04 - valid loss: 2.86e+03, valid accuracy: 0.26230588555336

Adel-Moumen · 2024-04-21T10:42:17Z

Hi, thanks @TParcollet for the explanation, it's clearer now. Thanks @Adel-Moumen for pointing out the flag.

I am currently pretraining with --grad_accumulation_factor=2 and max_batch_length=400 on 8 gpus yielding 2 * 400 * 8 = 6400 (~1.8h).

Here's the logs for the first epoch: epoch: 1, steps: 4611, lr: 7.68e-05 - train loss: 4.84e+04 - valid loss: 2.86e+03, valid accuracy: 0.26230588555336

Seems to be similar to our model checkpoint. Note that now you have done during your first epoch "only" 4611 steps meaning that the training will go for much longer. I do expect that you'll get better results.

BTW, are you using --precision=fp16 for the pre-training?

GasserElbanna · 2024-04-21T14:43:24Z

BTW, are you using --precision=fp16 for the pre-training?

I am using fp32 now.

TParcollet · 2024-04-23T13:46:04Z

fp16 or bf16 would make the training much faster if you have a compatible GPU.

GasserElbanna added the bug Something isn't working label Apr 16, 2024

Adel-Moumen self-assigned this Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce pretraining results for Wav2vec2 using LibriSpeech recipe #2512

Can't reproduce pretraining results for Wav2vec2 using LibriSpeech recipe #2512

GasserElbanna commented Apr 16, 2024

Adel-Moumen commented Apr 19, 2024

TParcollet commented Apr 19, 2024

GasserElbanna commented Apr 19, 2024

TParcollet commented Apr 20, 2024

Adel-Moumen commented Apr 20, 2024

GasserElbanna commented Apr 20, 2024

Adel-Moumen commented Apr 21, 2024

GasserElbanna commented Apr 21, 2024

TParcollet commented Apr 23, 2024

Can't reproduce pretraining results for Wav2vec2 using LibriSpeech recipe #2512

Can't reproduce pretraining results for Wav2vec2 using LibriSpeech recipe #2512

Comments

GasserElbanna commented Apr 16, 2024

Describe the bug

Expected behaviour

To Reproduce

Environment Details

Relevant Log Output

Additional Context

Adel-Moumen commented Apr 19, 2024

TParcollet commented Apr 19, 2024

GasserElbanna commented Apr 19, 2024

TParcollet commented Apr 20, 2024

Adel-Moumen commented Apr 20, 2024

GasserElbanna commented Apr 20, 2024

Adel-Moumen commented Apr 21, 2024

GasserElbanna commented Apr 21, 2024

TParcollet commented Apr 23, 2024