Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: While importing/Loading tfds plant_leaves dataset #5416

Open
Coolcoder45 opened this issue May 16, 2024 · 2 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@Coolcoder45
Copy link

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description
tfds plant_leaves is not getting loaded successfully. It's throwing NotImplementedError. Tried on May 16, 2024

Environment information

  • Operating System: Windows 11

  • Python version: 3.10.12

  • tensorflow-datasets/tfds-nightly version: 4.9.4

  • tensorflow/tf-nightly version: version: 2.15.0

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ? Yup

Reproduction instructions

import tensorflow_datasets as tfds
plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)

Gives:

Downloading and preparing dataset 6.56 GiB (download: 6.56 GiB, generated: 6.81 GiB, total: 13.37 GiB) to /root/tensorflow_datasets/plant_leaves/0.1.1...
DlCompleted...: 100%1/1 [10:04<00:00, 604.39s/url]
DlSize...: 100%6718/6718 [10:04<00:00, 11.25MiB/s]
Dataset plant_leaves downloaded and prepared to /root/tensorflow_datasets/plant_leaves/0.1.1. Subsequent calls will reuse this data.
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
[<ipython-input-3-d88d46497437>](https://localhost:8080/#) in <cell line: 2>()
      1 import tensorflow_datasets as tfds
----> 2 plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)

33 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/file_adapters.py](https://localhost:8080/#) in make_tf_data(cls, filename, buffer_size)
    206   ) -> tf.data.Dataset:
    207     """Returns TensorFlow Dataset comprising given array record file."""
--> 208     raise NotImplementedError(
    209         '`.as_dataset()` not implemented for ArrayRecord files. Please, use'
    210         ' `.as_data_source()`.'

NotImplementedError: `.as_dataset()` not implemented for ArrayRecord files. Please, use `.as_data_source()`.

Expected behavior
To load dataset successfully.

@Coolcoder45 Coolcoder45 added the bug Something isn't working label May 16, 2024
@pierrot0 pierrot0 self-assigned this May 17, 2024
@pierrot0
Copy link
Collaborator

Hi, thank you for reporting!
This is definitely a bug.

Workaround: add the following arg to your tfds.load call:

tfds.load(..., download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

We'll look on how to update the code and update on the bug.

@Coolcoder45
Copy link
Author

It's still giving error.

import tensorflow_datasets as `tfds`
plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

Gives

Downloading and preparing dataset 6.56 GiB (download: 6.56 GiB, generated: 6.81 GiB, total: 13.37 GiB) to /root/tensorflow_datasets/plant_leaves/0.1.1...
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-3-608b46b22c6c>](https://localhost:8080/#) in <cell line: 4>()
      2 #plant_leaves = tfds.load('plant_leaves', split='train', shuffle_files=True)
      3 #plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, as_data_source=True)
----> 4 plant_leaves_data, plant_leaves_info = tfds.load('plant_leaves', split='train', shuffle_files=True, download_and_prepare_kwargs={'file_format': tfds.core.FileFormat.ARRAY_RECORD})

5 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
    167     metadata = self._start_call()
    168     try:
--> 169       return function(*args, **kwargs)
    170     except Exception:
    171       metadata.mark_error()

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
    645       try_gcs,
    646   )
--> 647   _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
    648 
    649   if as_dataset_kwargs is None:

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
    504   if download:
    505     download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 506     dbuilder.download_and_prepare(**download_and_prepare_kwargs)
    507 
    508 

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
    167     metadata = self._start_call()
    168     try:
--> 169       return function(*args, **kwargs)
    170     except Exception:
    171       metadata.mark_error()

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_builder.py](https://localhost:8080/#) in download_and_prepare(self, download_dir, download_config, file_format)
    679     # to generate the files.
    680     if file_format:
--> 681       self.info.set_file_format(file_format, override=True)
    682 
    683     # Create a tmp dir and rename to self.data_dir on successful exit.

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_info.py](https://localhost:8080/#) in set_file_format(self, file_format, override)
    470       )
    471     if override and self._fully_initialized:
--> 472       raise RuntimeError(
    473           "Cannot override the file format "
    474           "when the DatasetInfo is already fully initialized!"

RuntimeError: Cannot override the file format when the DatasetInfo is already fully initialized!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants