Replies: 1 comment
-
That should be possible but with a huge computational penalty as whisper decodes one token after another and the decoding of a single token is where the suppression takes place, my idea is to implement a function that keeps checking the transcript for the suppressed sequence and if detected, it should rewind the decoding to start right before the sequence with a temporarily modified suppression list, another simpler solution is to filter these sequences from the final token sequence which works better than the first one IMO |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I see that it's possible to suppress single tokens using the 'suppress_token' option to whisper.transcribe(). And as has been implemented in find_numeral_symbol_tokens()
Is it possible to create particular sequences of tokens that can be suppressed? In other words only if a particular sequence of tokens appears in order, but if a token in that set appears outside of the suppress set, it would still be allowed.
Example: Lets suppress the speech mannerism of ', you know,' that is a common filler phrase in spoken American English that generally provides no semantic content and is typically not transcribed in human transcription (unless specifically requested.)
In the above example the set of tokens to suppress would be
[',','you','know',','] => [11, 5616, 15869, 11]
, but obviously I would not want to individually suppress those tokens, only when in the form of the 'filler phrase'.This could also be used for suppressing vulgarities that are composed of words that are individually innocuous.
Beta Was this translation helpful? Give feedback.
All reactions