Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset download error #1762

Open
Thaumaturge2020 opened this issue Jan 16, 2024 · 9 comments
Open

dataset download error #1762

Thaumaturge2020 opened this issue Jan 16, 2024 · 9 comments

Comments

@Thaumaturge2020
Copy link

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: master

Habitat-Sim: master

when I try to download the habitat_test_scenes I got this error:

stderr: 'fatal: unable to access 'https://huggingface.co/datasets/ai-habitat/habitat_test_scenes.git/': gnutls_handshake() failed: Error in the pull function.'

the cmd I used was :

python -m habitat_sim.utils.datasets_download --uids habitat_test_scenes --data-path data/

How could i solve this problem?

@aclegg3
Copy link
Contributor

aclegg3 commented Jan 22, 2024

Try again, this is working for me. I haven't seen this error personally, but it may be a simple network outage at the time of download.

@21stYouth
Copy link

i have the same err in download, have you solved it?

@aclegg3
Copy link
Contributor

aclegg3 commented Jan 25, 2024

Hey folks, looks like this could very well be a local system package or firewall issue: https://stackoverflow.com/questions/52379234/git-gnutls-handshake-failed-error-in-the-pull-function

@21stYouth
Copy link

Unfortunately,i still have the problem.

Here is the err (when i use python -m habitat_sim.utils.datasets_download --uids habitat_test_scenes --data-path data/):

git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git fetch -v -- origin v1.6
stderr: 'fatal: unable to access 'https://huggingface.co/datasets/ai-habitat/ReplicaCAD_dataset.git/': Failed to connect to huggingface.co port 443 after 133412 ms: Connection timed out'

I went to the website https://huggingface.co/datasets/ai-habitat/ReplicaCAD_dataset.git/, it showed 404 to me. Is that normal, or just i have this result?

@xavierpuigf
Copy link
Contributor

I am not sure of what could be cause the timing out issue. As for the link, the way you can see the dataset on the website is by removing the .git extension.

https://huggingface.co/datasets/ai-habitat/ReplicaCAD_dataset

@Lr-2002
Copy link

Lr-2002 commented Feb 7, 2024

Are you in China Mainland? This was because of the network ,you need to the proxy to solve it.

@21stYouth
Copy link

21stYouth commented Feb 10, 2024

Thanks everyone, i finally found the solution of these problems.
May i repeat the key point of my err:
When i use python -m habitat_sim.utils.datasets_download --uids habitat_test_scenes --data-path data/ or python examples/example.py, the program will have err and said

stderr: 'fatal: unable to access 'https://huggingface.co/datasets/ai-habitat/ReplicaCAD_dataset.git/'

So here are my solutions:

  1. Git over SSH in Huggingface: You can see details in https://huggingface.co/docs/hub/security-git-ssh
  2. Find the git clone command in the err report, like

git clone --depth 1 --branch v1.6 https://huggingface.co/datasets/ai-habitat/ReplicaCAD_dataset.git /home/xxx/habitat-lab/data/versioned_data/replica_cad_dataset

  1. Change the https://huggingface.co/ to git@hf.co:. So the command will be like

git clone --depth 1 --branch v1.6 git@hf.co:datasets/ai-habitat/ReplicaCAD_dataset.git /home/xxx/habitat-lab/data/versioned_data/replica_cad_dataset

  1. Use this command to download the datasets again and again when having the err, until all the datasets are downloaded

@21stYouth
Copy link

21stYouth commented Feb 10, 2024

By the way, the problem of

subprocess.CalledProcessError: Command '['git', 'clone', '--depth', '1', '--branch', 'v1.6', 'https://huggingface.co/datasets/ai-habitat/ReplicaCAD_dataset.git', '/home/xxx/habitat-lab/data/versioned_data/replica_cad_dataset']' returned non-zero exit status 128.

maybe can be solved by this way as well

@aclegg3
Copy link
Contributor

aclegg3 commented Feb 12, 2024

@21stYouth

HF has deprecated use of username/password for authentication. By using git@hf.com you specify use of the SSH authentication channel.
Our downloader does this for authenticated datasets. I'm a bit surprised to see this is ever necessary for non-authenticated datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants