Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted files in fma_small #70

Open
JakubK opened this issue Mar 18, 2024 · 2 comments
Open

Corrupted files in fma_small #70

JakubK opened this issue Mar 18, 2024 · 2 comments

Comments

@JakubK
Copy link

JakubK commented Mar 18, 2024

Im not sure if it's the right call, but I have encountered issues with some samples when working on fma_small

Reproduction:

corrupted_indicies = []
for i, audio_id in tqdm(enumerate(train)):
    try:
      # Load audio file
      y, sr = librosa.load(get_audio_path(AUDIO_DIR, audio_id))
    except:
      print("There was a problem with ", audio_id)
      corrupted_indicies.append(i)

Where train variable holds IDs of all fma_small samples labelled as "train".
For some samples librosa.load fails to load:

y, sr = librosa.load(get_audio_path(AUDIO_DIR, 133297))

Produces:

LibsndfileError                           Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/librosa/core/audio.py](https://localhost:8080/#) in load(path, sr, mono, offset, duration, dtype, res_type)
    174         try:
--> 175             y, sr_native = __soundfile_load(path, offset, duration, dtype)
    176 

7 frames
LibsndfileError: Error opening 'fma_small/133/133297.mp3': File does not exist or is not a regular file (possibly a pipe?).

During handling of the above exception, another exception occurred:

NoBackendError                            Traceback (most recent call last)
<decorator-gen-119> in __audioread_load(path, offset, duration, dtype)

[/usr/local/lib/python3.10/dist-packages/audioread/__init__.py](https://localhost:8080/#) in audio_open(path, backends)
    130 
    131     # All backends failed!
--> 132     raise NoBackendError()

NoBackendError:

When I check my colab session, I can see that the mp3 file is actually present in the specified location.
Downloaded file is surprisingly small, and playing this on my audio player, crashes it.

Problem does not occur for most of the files.
Test and validation subsets are clean.

Problematic Ids that I have spotted:

133297, 108925, 99134

@allispaul
Copy link

These are known issues, see here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants