Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arguments time_off and duration to transcriber #1533

Open
me-kell opened this issue Mar 6, 2024 · 2 comments
Open

Add arguments time_off and duration to transcriber #1533

me-kell opened this issue Mar 6, 2024 · 2 comments

Comments

@me-kell
Copy link

me-kell commented Mar 6, 2024

Currently the transcriber processes the whole input file. From the beginning to the end.

It would be very useful to be able to pass a start time offset and/or a duration to the transcriber.

Here is a proposal how to do it:

Add (ffmpeg's) arguments time_off and duration in python/vosk/transcriber/cli.py after line 46.

parser.add_argument("--time_off", "-ss", default=None, type=int, help="start time offset")
parser.add_argument("--duration", "-d", default=None, type=int, help="duration")

Pass the arguments time_off and duration to ffmpeg in function resample_ffmpeg in python/vosk/transcriber/transcriber.py (line 115):

        cmd = shlex.split("ffmpeg -nostdin -loglevel quiet "
                "-i \'{}\' -ar {} -ac 1 {} {} -f s16le -".format(
                    str(infile), 
                    SAMPLE_RATE, 
                    f'-ss {self.args.time_off}' if self.args.time_off is not None else '', # add this
                    f'-t {self.args.duration}' if self.args.duration is not None else ''   # and this
                    ))

The function resample_ffmpeg_async could be adapted similarly.

@nshmyrev
Copy link
Collaborator

nshmyrev commented Mar 6, 2024

Hi, thank you for the proposal! Looks nice but what is the usecase please? I can't imagine the user needs to start from certain offset instead of just processing the whole file.

@me-kell
Copy link
Author

me-kell commented Mar 6, 2024

Some use cases:

  • Have a recording of an interview and a list of the start times of every question and answer. You may want to assign the transcripted parts to their respective time points (question and answer).
  • You have a music radio programm with the radio speaker commenting every two or three songs. You may want to transcribe only the radio speaker but not the music songs.
  • And last but not least: you have an audio file with different languages spoken by different speakers. You may want to transcript different parts of the audio in different languages using the corresponding language and model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants