Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training process crashes suddenly #642

Open
2 tasks done
dejankocic opened this issue May 16, 2024 · 1 comment
Open
2 tasks done

Training process crashes suddenly #642

dejankocic opened this issue May 16, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@dejankocic
Copy link

Prerequisites

  • I have read the documentation.
  • I have checked other issues for similar problems.

Backend

Local

Interface Used

CLI

CLI Command

autotrain app --port 8080 --host 127.0.0.1

UI Screenshots & Parameters

image

Error Logs

Loading checkpoint shards: 75%|███████▌ | 3/4 [00:09<00:03, 3.21s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00, 2.31s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00, 2.63s/it]
INFO | 2024-05-15 23:14:00 | autotrain.trainers.clm.train_clm_sft:train:66 - model dtype: torch.float16
INFO | 2024-05-15 23:14:00 | autotrain.trainers.clm.train_clm_sft:train:79 - creating trainer

Generating train split: 0 examples [00:00, ? examples/s]
Generating train split: 0 examples [00:00, ? examples/s]
ERROR | 2024-05-15 23:14:02 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1748, in _prepare_split_single
for key, record in generator:
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/packaged_modules/generator/generator.py", line 30, in _generate_examples
for idx, ex in enumerate(self.config.generator(**gen_kwargs)):
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 536, in data_generator
yield from constant_length_iterator
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/utils.py", line 458, in iter
buffer_len += len(buffer[-1])
TypeError: object of type 'NoneType' has no len()

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 539, in _prepare_packed_dataloader
packed_dataset = Dataset.from_generator(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 1117, in from_generator
return GeneratorDatasetInputStream(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/io/generator.py", line 47, in read
self.builder.download_and_prepare(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1027, in download_and_prepare
self._download_and_prepare(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1789, in _download_and_prepare
super()._download_and_prepare(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1122, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1627, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/home/dejan/python39venv/lib/python3.9/site-packages/datasets/builder.py", line 1784, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/common.py", line 117, in wrapper
return func(*args, **kwargs)
File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/clm/main.py", line 28, in train
train_sft(config)
File "/home/dejan/python39venv/lib/python3.9/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 86, in train
trainer = SFTTrainer(
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 283, in init
train_dataset = self._prepare_dataset(
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 435, in _prepare_dataset
return self._prepare_packed_dataloader(
File "/home/dejan/python39venv/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 543, in _prepare_packed_dataloader
raise ValueError(
ValueError: Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.

ERROR | 2024-05-15 23:14:02 | autotrain.trainers.common:wrapper:121 - Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.
INFO | 2024-05-15 23:14:03 | autotrain.utils:get_running_jobs:57 - Killing PID: 165343

Additional Information

The txt file I am using for testing has about 300 lines and not sure if this the reason or something else.

@dejankocic dejankocic added the bug Something isn't working label May 16, 2024
@hichambht32
Copy link

hichambht32 commented May 22, 2024

i guess it should be a csv file not a text file, as for the dataset columns try to respect the suggested format regading your use case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants