Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot build hugging face datasets #5394

Open
ppham27 opened this issue May 2, 2024 · 1 comment
Open

Cannot build hugging face datasets #5394

ppham27 opened this issue May 2, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ppham27
Copy link
Contributor

ppham27 commented May 2, 2024

Short description

$  tfds build huggingface:mnist/mnist

FileNotFoundError: Request failed for https://raw.githubusercontent.com/huggingface/datasets/master/datasets/mnist/dataset_infos.json
 Error: 404
 Reason: b'404: Not Found'

It seems the index (

`gs://tfds-data/community-datasets-list.jsonl`)
) is out of date and hasn't been updated to use the hub: huggingface/datasets#4059.

Environment information

  • Operating System: macOS
  • Python version: 3.11
  • tensorflow-datasets/tfds-nightly version: tfds-nightly
  • tensorflow/tf-nightly version: 2.16.1
  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ? Yes.

Reproduction instructions

 tfds build huggingface:mnist/mnist

If you share a colab, make sure to update the permissions to share it.

Link to logs

INFO[config.py]: Loading namespace config from /usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/community-datasets.toml
Traceback (most recent call last):
  File "/usr/local/google/home/phillypham/venv/grain/bin/tfds", line 8, in <module>
    sys.exit(launch_cli())
             ^^^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/scripts/cli/main.py", line 105, in launch_cli
    app.run(main, flags_parser=_parse_flags)
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/scripts/cli/main.py", line 100, in main
    args.subparser_fn(args)
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/scripts/cli/build.py", line 302, in _build_datasets
    builders_cls_and_kwargs = [
                              ^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/scripts/cli/build.py", line 303, in <listcomp>
    _get_builder_cls_and_kwargs(dataset, has_imports=bool(args.imports))
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/scripts/cli/build.py", line 420, in _get_builder_cls_and_kwargs
    builder_cls = tfds.builder_cls(str(name))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/load.py", line 114, in builder_cls
    return community.community_register().builder_cls(ds_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/community/registry.py", line 259, in builder_cls
    return registers[0].builder_cls(name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/community/register_package.py", line 249, in builder_cls
    installed_dataset = _download_or_reuse_cache(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/community/register_package.py", line 402, in _download_or_reuse_cache
    installed_package = _download_and_cache(package)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/community/register_package.py", line 449, in _download_and_cache
    dataset_sources_lib.download_from_source(
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/community/dataset_sources.py", line 80, in download_from_source
    path.copy(dst / path.name)
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/github_api/github_path.py", line 338, in copy
    dst.write_bytes(self.read_bytes())
                    ^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/github_api/github_path.py", line 311, in read_bytes
    return get_content(url)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/phillypham/venv/grain/lib/python3.11/site-packages/tensorflow_datasets/core/github_api/github_path.py", line 44, in get_content
    raise FileNotFoundError(
FileNotFoundError: Request failed for https://raw.githubusercontent.com/huggingface/datasets/master/datasets/mnist/dataset_infos.json
 Error: 404
 Reason: b'404: Not Found'

Expected behavior

For it to work and call download_and_prepare.

Additional context

python -c "import tensorflow_datasets as tfds; tfds.builder('huggingface:mnist/mnist')"

works.

@ppham27 ppham27 added the bug Something isn't working label May 2, 2024
@fineguy fineguy self-assigned this May 27, 2024
@lbo462
Copy link

lbo462 commented Jun 11, 2024

Have you tried replacing / with __ ?

If you're trying to work with mnist, you can pull it from the TensorFlow datasets catalog at https://www.tensorflow.org/datasets/catalog/overview :

python -c "import tensorflow_datasets as tfds; tfds.builder('mnist')" works as well.

If you do need to pull a dataset from HuggingFace, consider using tfds.load(), and replace / with __.

Hope this could help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants