Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Importer import_librivox.py can't render absolute path of WAV files in CSV #2349

Open
RobinE89 opened this issue Mar 3, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@RobinE89
Copy link

RobinE89 commented Mar 3, 2023

Hey,

(I don't know if this applies to all importers - I noticed it with the librivox - importer)
if you define the basepath eg with "own_projects", the importer writes the filename like that:
own_project/LibriSpeech/ dev-clean-wav/7850/73752/0000/7850-73752-0000.wav
and place the csv in the following folder:
own_project/LibriSpeech/

if you then want to train that set and define the train path as follows:
--train_files own_project/LibriSpeech/librivox-train-clean-100.csv

...it will fail because:
own_project/LibriSpeech/ own_project/LibriSpeech/ dev-clean-wav/7850/73752/0000/7850-73752-0000.wav not found.

"coqui_stt_training.train" inserts the path of the csv file in front of the file name.
That makes sense, but then the path from the importer is wrong.
trifle, but I still think it's not wanted?!

best regards

@RobinE89 RobinE89 added the bug Something isn't working label Mar 3, 2023
@wasertech
Copy link
Collaborator

wasertech commented Mar 5, 2023

--relative is set to False by default.

STT/bin/import_librivox.py

Lines 193 to 197 in a694187

parser.add_argument(
"--relative",
action="store_true",
help="whether to store relative paths in CSV",
)

So it is using the absolute path of the wav files in the csv.

STT/bin/import_librivox.py

Lines 126 to 144 in a694187

# Convert corresponding FLAC to a WAV
base = trans_filename.parent.joinpath(seqid)
flac_file = base.with_suffix(".flac")
wav_file = (
dest_dir.joinpath(*seqid.split("-"))
.joinpath(seqid)
.with_suffix(".wav")
)
wav_file.parent.mkdir(parents=True, exist_ok=True)
if wav_file.exists():
wav_filesizes.append(os.path.getsize(wav_file))
else:
conversions.append((flac_file, wav_file, sample_rate))
if relative_to:
wav_file = wav_file.relative_to(relative_to)
files.append(str(wav_file))

STT/bin/import_librivox.py

Lines 152 to 155 in a694187

return pandas.DataFrame(
data=zip(files, wav_filesizes, transcripts),
columns=["wav_filename", "wav_filesize", "transcript"],
)

The real issue is with this line.

files.append(str(wav_file))

Since dest_dir is a Path object,
def _convert_audio_and_split_sentences(
source_dir: Path, dest_dir: Path, sample_rate: int, relative_to: Optional[Path]
):

it should really be

files.append(wav_file.as_posix())

@wasertech wasertech changed the title Bug: Importer - wrong path Bug: Importer import_librivox.py can't render absolute path of WAV files in CSV Mar 5, 2023
wasertech added a commit that referenced this issue Mar 5, 2023
@wasertech
Copy link
Collaborator

wasertech commented Mar 5, 2023

it should really be
files.append(wav_file.as_posix())

@RobinE89 can you make the changes locally and test if it works?

I'm starting to think that I've just broke compatibility with windows without even solving anything because str(Path()) should work as expected on all platforms...

https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.as_posix

Anyways you should probably set --realtive to True in all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants