Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining error #109

Open
liu3zhenlab opened this issue Jul 10, 2023 · 0 comments
Open

Pretraining error #109

liu3zhenlab opened this issue Jul 10, 2023 · 0 comments

Comments

@liu3zhenlab
Copy link

liu3zhenlab commented Jul 10, 2023

Thanks for developing these valuable scripts. We ran run_pretrain.py and encountered the following issue. We appreciate your guidance for the troubleshooting. Thanks.

07/10/2023 00:11:34 - INFO - main - Training new model from scratch
07/10/2023 00:11:36 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-06, beta1=0.9, beta2=0.98, block_size=512, cache_dir=None, config_name='../src/transfo
rmers/dnabert-config/bert-config-6/config.json', device=device(type='cpu'), do_eval=True, do_train=True, eval_all_checkpoints=False, eval_data_file='../data/3k_6mer/1st_finished_asm_3k_
6mer.all', evaluate_during_training=True, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=25, learning_rate=0.0004, line_by_line=True, local_rank=-1, logging_steps=500, max
grad_norm=1.0, max_steps=200000, mlm=True, mlm_probability=0.025, model_name_or_path=None, model_type='dna', n_gpu=0, n_process=16, no_cuda=False, num_train_epochs=1.0, output_dir='k6'
, overwrite_cache=False, overwrite_output_dir=True, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=8, save_steps=500, save_total_limit=20, seed=42, server_ip='', server_port='', sh
ould_continue=False, tokenizer_name='dna6', train_data_file='../data/3k_6mer/1st_finished_asm_3k_6mer.all', warmup_steps=10000, weight_decay=0.01)
07/10/2023 00:11:36 - INFO - main - Creating features from dataset file at ../data/3k_6mer/1st_finished_asm_3k_6mer.all
07/10/2023 00:14:39 - INFO - main - Saving features into cached file ../data/3k_6mer/dna_cached_lm_512_1st_finished_asm_3k_6mer.all
07/10/2023 00:14:47 - INFO - main - ***** Running training *****
07/10/2023 00:14:47 - INFO - main - Num examples = 357566
07/10/2023 00:14:47 - INFO - main - Num Epochs = 112
07/10/2023 00:14:47 - INFO - main - Instantaneous batch size per GPU = 8
07/10/2023 00:14:47 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 200
07/10/2023 00:14:47 - INFO - main - Gradient Accumulation steps = 25
07/10/2023 00:14:47 - INFO - main - Total optimization steps = 200000
Iteration: 0%| | 0/44696 [00:00<?, ?it/s]
Epoch: 0%| | 0/112 [00:00<?, ?it/s]?it/s]
Traceback (most recent call last):
File "../scripts/run_pretrain.py", line 890, in
main()
File "../scripts/run_pretrain.py", line 840, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "../scripts/run_pretrain.py", line 426, in train
inputs, labels = mask_tokens(batch, tokenizer, args) if args.mlm else (batch, batch)
File "../scripts/run_pretrain.py", line 272, in mask_tokens
probability_matrix.masked_fill
(torch.tensor(special_tokens_mask, dtype=torch.bool), value=0.0)
RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'mask'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant