Fix the decoding issues #1768

bobqianic · 2024-01-14T15:15:26Z

revert change

Patch

bobqianic · 2024-02-06T15:07:14Z

I think that to completely avoid hallucination, the best approach is similar to using DTW to calculate token timestamps. By comparing these with cross-attention weights, we can definitely identify anomalies if there are any hallucinations.

felrock

looks good to me, but I think @ggerganov needs to approve it

ggerganov · 2024-02-08T15:24:15Z

Did you run some tests?

bobqianic · 2024-02-08T16:12:37Z

Did you run some tests?

I've done some initial testing, and the results are promising. However, I need a bit more time to conduct a comprehensive analysis. You can already notice the difference by testing a few audio files. Currently, I'm downloading the Common Voice Corpus 15.0, which is over 100GB, so completing the testing will take a little while.

There is a person who sent me a test file via Discord. Running large-v2 with master will generate a lot of duplicate content, but using this PR will be much better. The file is copyrighted, so I cannot make it public, but you can ask him for it privately.

#1724 (comment)

jettoblack · 2024-02-08T17:24:43Z

@bobqianic I'm very appreciative of this work and very excited to see this branch implemented, but getting some bad results with weird non-speech tokens at the beginning of many files, this problem does not happen in master branch.

Example 1:

wav file: https://www.dropbox.com/scl/fi/bdz7lx4khunq3kiauyus8/shermer.wav?rlkey=hzy02rkewjb4pwoamp9whch4b&dl=0

Command:

./main -m ggml-largev2.bin -f shermer.wav

Output of master branch @ `434b8f3` (current):

[00:00:00.000 --> 00:00:09.000] [music]
[00:00:09.000 --> 00:00:12.000] [applause]
[00:00:12.000 --> 00:00:14.000] Hey, I am Michael Shermer, the director of the Skeptic Society,
...

Output of this PR @ `c0277e3`:

[00:00:00.000 --> 00:00:07.000] Transcriber's Name Reviewer's Name
[00:00:12.340 --> 00:00:14.300] I am Michael Shermer, the director of the Skeptic Society,
...

Example 2 with translate fr to en:

wav file: https://www.dropbox.com/scl/fi/1go0yxkr10vwhfyxs76vz/french.wav?rlkey=312gc5qmw3r31ovh003410hyb&dl=0

Command:

./main -m ggml-largev2.bin -f french.wav -l fr -tr

Output of master branch @ `434b8f3` (current):

[00:00:00.000 --> 00:00:04.000] (Music)
[00:00:04.000 --> 00:00:07.000] (Applause)
[00:00:07.000 --> 00:00:20.000] I am a champion of France.
...

Output of this PR @ `c0277e3`:

[00:00:00.000 --> 00:00:17.000] Translation & subtitling by Quentin Dewaghe Traduction & sous-titrage par Quentin Dewaghe q.dewaghe.com
[00:00:17.000 --> 00:00:20.000] I'm a champion of France.
...

Any idea why these non-speech tokens like "Transcriber's Name Reviewer's Name" are being output as speech at the beginning? Thanks again.

bobqianic · 2024-02-08T17:48:38Z

Any idea why these non-speech tokens like "Transcriber's Name Reviewer's Name" are being output as speech at the beginning? Thanks again.

Thank you for letting me know. It seems the primary issue stems from my having suppressed non-speech tokens, which has resulted in symbols like ( and [ having a zero probability of appearing. While this approach enhances the overall quality, it clearly didn't account for situations like yours, which I hadn't anticipated. As mentioned, I'll conduct further tests and explore ways to address this issue.

Heuristic

bobqianic · 2024-02-09T18:15:23Z

@jettoblack I've added a heuristic for detecting repetitive hallucinations, which you can disable via parameters if you prefer. Additionally, I've removed the tokens ( ) [ and ] from the list of tokens to be suppressed, so they will remain unaffected even when suppression mode is enabled.

Output of this PR @ `476dff4`:

[00:00:00.000 --> 00:00:17.000] [Music]
[00:00:17.000 --> 00:00:20.000] I am a champion of France.
...

jettoblack · 2024-02-10T03:42:23Z

@bobqianic The repetition heuristic seems to be working well so far. I'm seeing fewer hallucinations on silent intervals. I looked at the code and this is unrelated to the non-speech token changes, right?

I'm not so sure about the non-speech token changes. With your latest commit I see fewer cases of the problem I mentioned above, but it's still happening a lot. One example I got just now in the sg1.wav file I sent you previously on Discord:

A hallucination like ♪♪ or repeated text is far less objectionable than someone else's copyright notice or translator's notes which is what I'm getting a lot of.

This change also removes many useful tokens from the output, like quotation marks and music notes. Using the -nsnst option restores these tokens but that causes this issue to be much worse, and I've caught a lot more cases of it occurring in many files, including in the middle of files not just the beginning. If these were the only two options I'd leave suppression enabled, but master branch includes these useful tokens without this hallucination problem.

It might be helpful to compare the output of a branch with the other fixes of this PR excluding the non-speech token changes, or at least have a way to turn those completely off and go back to master branch behavior.

bobqianic · 2024-02-10T13:29:45Z

I looked at the code and this is unrelated to the non-speech token changes, right?

Yes. In situations where the model exhibits hallucinations with high confidence (avg_log_probs), this non-speech token approach will not be effective. The heuristic repetition check that I've implemented serves as a workaround for the compression ratio check. Implementing compression in C++ can be challenging without using third-party libraries. In the official implementation by OpenAI, both the compression ratio and non-speech tokens anti-hallucination mechanisms are utilized.

[00:57:11.700 --> 00:57:14.700] (c) 2014 University of Georgia College of Agricultural and Environmental Sciences UGA Extension Office of Communications and Creative Services

Which branch are you using? I can't find the hallucinations you mentioned.

large-v2

jettoblack · 2024-02-12T19:18:12Z

Which branch are you using? I can't find the hallucinations you mentioned.

I was using this PR @ 476dff4, unless I did something wrong, but this was on a Mac using the Metal gpu backend so that could make a difference. I'll retest on CPU and CUDA shortly and let you know.

ukolovda · 2024-02-16T20:48:48Z

Hi!

@bobqianic new version is very robust!

On my test files, main branch emit 10 hallucinations on 26 WAV files (model ggml-large-v2, russian language).
With this PR it give only 2 hallucination. It is very fine result!!!

But example/server doesn't work at all, both CPU and CUDA versions. It returns empty text without any errors.
I try patch and append parameters (heuristic and other), but it not help. With --print-progress it print progress, but not result text.

Also it give the error on specific file:
500 Internal Server Error map::at

What can we do for fix it, how do you think?

Run server command:

/usr/src/whisper.cpp-bobqianic/server -m ../../models/ggml-large-v2.bin -l ru --print-progress --print-realtime -nt -nf

whisper_init_from_file_with_params_no_state: loading model from '../../models/ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load:    CUDA0 total size =  3094.49 MB (3 buffers)
whisper_model_load: model size    = 3093.99 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: compute buffer (conv)   =   33.91 MB
whisper_init_state: compute buffer (encode) =  233.50 MB
whisper_init_state: compute buffer (cross)  =   10.15 MB
whisper_init_state: compute buffer (decode) =  108.99 MB

whisper server listening at http://127.0.0.1:8080

Received request: 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-14.wav
Successfully loaded 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-14.wav

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

operator(): processing '0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-14.wav' (168960 samples, 10.6 sec), 4 threads, 1 processors, lang = ru, task = transcribe, timestamps = 0 ...

Running whisper.cpp inference on 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-14.wav
Received request: 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-15.wav
Successfully loaded 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-15.wav

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

operator(): processing '0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-15.wav' (235200 samples, 14.7 sec), 4 threads, 1 processors, lang = ru, task = transcribe, timestamps = 0 ...

Running whisper.cpp inference on 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-15.wav

whisper_print_progress_callback: progress = 204%
Received request: 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-16.wav
Successfully loaded 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-16.wav

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

operator(): processing '0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-16.wav' (512000 samples, 32.0 sec), 4 threads, 1 processors, lang = ru, task = transcribe, timestamps = 0 ...

Running whisper.cpp inference on 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-16.wav

whisper_print_progress_callback: progress =  93%

whisper_print_progress_callback: progress = 187%
Received request: 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-18.wav
Successfully loaded 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-18.wav

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

operator(): processing '0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-18.wav' (115520 samples, 7.2 sec), 4 threads, 1 processors, lang = ru, task = transcribe, timestamps = 0 ...

Running whisper.cpp inference on 0f3657ce-6352-4cbb-a88f-b39dc6a37a34-1-18.wav

whisper_print_progress_callback: progress = 416%
...

Send file command:

curl localhost:8080/inference -H "Content-Type: multipart/form-data" -F file="@${filename}"

git diff result:

diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index cf0157d..5030e87 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -64,6 +64,7 @@ struct whisper_params {
     float word_thold      =  0.01f;
     float entropy_thold   =  2.40f;
     float logprob_thold   = -1.00f;
+    float no_speech_thold =  0.60f;
     float temperature     =  0.00f;
     float temperature_inc =  0.20f;
 
@@ -78,6 +79,8 @@ struct whisper_params {
     bool print_realtime  = false;
     bool print_progress  = false;
     bool no_timestamps   = false;
+    bool suppress_nst    = true;  // suppress non speech tokens
+    bool heuristic       = true;
     bool use_gpu         = true;
 
     std::string language        = "en";
@@ -183,7 +186,10 @@ bool whisper_params_parse(int argc, char ** argv, whisper_params & params, serve
         else if (arg == "-wt"   || arg == "--word-thold")      { params.word_thold      = std::stof(argv[++i]); }
         else if (arg == "-et"   || arg == "--entropy-thold")   { params.entropy_thold   = std::stof(argv[++i]); }
         else if (arg == "-lpt"  || arg == "--logprob-thold")   { params.logprob_thold   = std::stof(argv[++i]); }
+        else if (arg == "-nst"  || arg == "--nospeech-thold")  { params.no_speech_thold = std::stof(argv[++i]); }
         // else if (arg == "-su"   || arg == "--speed-up")        { params.speed_up        = true; }
+        else if (arg == "-nsnst"|| arg == "--no-suppress-nst") { params.suppress_nst    = false; }
+        else if (arg == "-nh"   || arg == "--no-heuristic")    { params.heuristic       = false; }
         else if (arg == "-tr"   || arg == "--translate")       { params.translate       = true; }
         else if (arg == "-di"   || arg == "--diarize")         { params.diarize         = true; }
         else if (arg == "-tdrz" || arg == "--tinydiarize")     { params.tinydiarize     = true; }
@@ -726,6 +732,7 @@ int main(int argc, char ** argv) {
             wparams.max_len          = params.max_len == 0 ? 60 : params.max_len;
 
             wparams.speed_up         = params.speed_up;
+wparams.heuristic = params.heuristic;
 
             wparams.tdrz_enable      = params.tinydiarize; // [TDRZ]
 
@@ -738,8 +745,11 @@ int main(int argc, char ** argv) {
             wparams.temperature_inc  = params.temperature_inc;
             wparams.entropy_thold    = params.entropy_thold;
             wparams.logprob_thold    = params.logprob_thold;
+wparams.no_speech_thold = params.no_speech_thold;
 
             wparams.no_timestamps    = params.no_timestamps;
+wparams.suppress_non_speech_tokens = params.suppress_nst;
+
             wparams.token_timestamps = !params.no_timestamps && params.response_format == vjson_format;
 
             whisper_print_user_data user_data = { &params, &pcmf32s, 0 };

Thank you!

felrock · 2024-02-17T16:04:42Z

Hello @ukolovda I took a look at this yesterday evening. Whats missing in server.cpp is what you mentioned:

heuristics
supress_nst
no_speech_thold

I got an output in the terminal by circumventing the print_realtime flag(instead of using a callback segment). So the model does in fact generate the output string but for some unknown reason whisper_full_n_segments(ctx) returns 0. I try to check this a bit more tomorrow.

ukolovda · 2024-02-19T12:44:00Z

I got an output in the terminal by circumventing the print_realtime flag(instead of using a callback segment). So the model does in fact generate the output string but for some unknown reason whisper_full_n_segments(ctx) returns 0.

Hello, @felrock !

Thank you!

ukolovda · 2024-02-20T10:10:11Z

Append issue with zero-filled WAV.
#1881

ukolovda · 2024-02-20T13:22:09Z

File from #1881 (zero filled WAV) give a gallucination in this version too.

$ ../whisper.cpp-bobqianic/main -m ./models/ggml-large-v3.bin -l ru --threads 8 -mc 0 samples/zeroes.wav
whisper_init_from_file_with_params_no_state: loading model from './models/ggml-large-v3.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4070, compute capability 8.9, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load:    CUDA0 total size =  3094,86 MB (3 buffers)
whisper_model_load: model size    = 3094,36 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  220,20 MB
whisper_init_state: kv cross size =  245,76 MB
whisper_init_state: compute buffer (conv)   =   35,50 MB
whisper_init_state: compute buffer (encode) =  233,50 MB
whisper_init_state: compute buffer (cross)  =   10,15 MB
whisper_init_state: compute buffer (decode) =  108,99 MB

system_info: n_threads = 8 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

run: processing 'samples/zeroes.wav' (19200 samples, 1,2 sec), 8 threads, 1 processors, 5 beams + best of 5, lang = ru, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:29.980]   Продолжение следует...


whisper_print_timings:     load time =   781,61 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     4,81 ms
whisper_print_timings:   sample time =    28,10 ms /    79 runs (    0,36 ms per run)
whisper_print_timings:   encode time =   162,31 ms /     1 runs (  162,31 ms per run)
whisper_print_timings:   decode time =     0,00 ms /     1 runs (    0,00 ms per run)
whisper_print_timings:   batchd time =   482,89 ms /    77 runs (    6,27 ms per run)
whisper_print_timings:   prompt time =     0,00 ms /     1 runs (    0,00 ms per run)
whisper_print_timings:    total time =  1502,74 ms

linmi · 2024-02-21T08:43:26Z

-output-json-full has problems with the output format.

Language: Chinese

thewh1teagle · 2024-03-31T22:38:34Z

What's the status of this PR? is it safe to use?
I experience decoding issues
thewh1teagle/vibe#34

jwijffels · 2024-04-05T07:48:45Z

I'm thinking about including this pull request in the R wrapper at audio.whisper . There the current approach to handle some of the hallucinations is to use R packages audio.vadwebrtc or audio.vadsilero to detect silences or general non-voiced signals and either

instead of looping over different files in the main loop, loop over the detected non-silence sections in the audio.
or create a new audio file with only the voiced audio and recompute the timestamps later on by adding what was left out

I haven't looked into the extreme details on this pull request (only skimmed through the logic which was changed in main.cpp and whisper.cpp) but would it make sense already to incorporate this pull request in audio.whisper or are there a lot of changes to be expected here or is this pull request going to be split into a BPE change (#1854) and a change regarding how to handle non-speech?

ronyfadel · 2024-04-30T17:12:55Z

@bobqianic are you pursuing this at the moment?

bobqianic · 2024-04-30T17:45:39Z

@bobqianic are you pursuing this at the moment?

No, at least not in May. I'm really tied up with a lot of things this month.

bygreencn · 2024-05-14T12:35:18Z

I'm thinking about including this pull request in the R wrapper at audio.whisper . There the current approach to handle some of the hallucinations is to use R packages audio.vadwebrtc or audio.vadsilero to detect silences or general non-voiced signals and either

instead of looping over different files in the main loop, loop over the detected non-silence sections in the audio.

or create a new audio file with only the voiced audio and recompute the timestamps later on by adding what was left out

I haven't looked into the extreme details on this pull request (only skimmed through the logic which was changed in main.cpp and whisper.cpp) but would it make sense already to incorporate this pull request in audio.whisper or are there a lot of changes to be expected here or is this pull request going to be split into a BPE change (#1854) and a change regarding how to handle non-speech?

The best way to include Silero Voice Activity into whisper.cpp is to add thirdparty package of onnxruntime1.12.1 dll, then call silero onnx model. My branch had added it. Even VAD, the hallucinations on silent intervals is also happenning.

IntendedConsequence · 2024-05-21T07:25:27Z

The best way to include Silero Voice Activity into whisper.cpp is to add thirdparty package of onnxruntime1.12.1 dll, then call silero onnx model. My branch had added it. Even VAD, the hallucinations on silent intervals is also happenning.

I recommend considering a previous Silero VAD version, namely v3.1. The current version v4 (at the moment of writing) often hallucinates speech on lengthy chunks of silent or near-silent audio segments.
snakers4/silero-vad#369
snakers4/silero-vad#396

But you have to add a heavyweight dependency like onnxruntime just to run a 750KB model. The smallest size I could possibly reduce onnxruntime.dll to was about 2.2MB, which is still 3x the size of silero weights, and requires a lengthy custom build of onnxruntime from source with reduced operator set configs and other size reduction options. And prebuilt redistributables are easily 5-9 MB or more.

I have a working Silero v3.1 implementation in pure C, but as much as I would like to suggest it as an option, the code is quite bad, I wrote it as a personal project for learning low level neural nets.

Add files via upload

71a65e7

This was linked to issues Jan 14, 2024

Invalid encoding #1761

Open

Unicode Error for Hindi transcription #1700

Open

bobqianic added the research🔬 label Jan 14, 2024

bobqianic mentioned this pull request Jan 14, 2024

examples: Fix the encoding issues on Windows #1313

Closed

4 tasks

Add files via upload

8301f88

bobqianic added 16 commits January 15, 2024 19:38

Add files via upload

1226204

revert change

c53c33b

Delete server directory

dfef69e

Merge pull request #1 from bobqianic/bobqianic-patch-1

7499e3c

revert change

Add files via upload

6648641

Add files via upload

9d0ebd1

Add files via upload

c8528a7

Merge pull request #2 from bobqianic/patch

7047d32

Patch

Add files via upload

96a9349

Fix ruby and go bindings

4b3a211

Add files via upload

3818acb

Add files via upload

b5c4d5c

Revert some changes

80589d2

Revert some changes

271c321

Merge branch 'ggerganov:master' into fix-decoding

5ea1d91

Remove hallucination by using token_nosp

41df3f0

bobqianic mentioned this pull request Jan 16, 2024

Hallucination on silence #1724

Open

edit some comments

2676819

bobqianic added the decoding Decoding related issues label Jan 17, 2024

revert logsumexp implementation

c0277e3

felrock reviewed Feb 6, 2024

View reviewed changes

bobqianic added 5 commits February 9, 2024 17:50

Add heuristic mode

b6d89b0

Bug Fix

3512527

Add heuristic mode

e091189

Bug Fix 2

de4f87f

Merge pull request #8 from bobqianic/heuristic

476dff4

Heuristic

jwijffels mentioned this pull request Mar 25, 2024

Notes on repetitions bnosac/audio.whisper#38

Open

tamo mentioned this pull request May 28, 2024

JSON Output Contains Garbled Characters for Chinese Audio Transcription #2180

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the decoding issues #1768

Fix the decoding issues #1768

bobqianic commented Jan 14, 2024 •

edited

bobqianic commented Feb 6, 2024

felrock left a comment

ggerganov commented Feb 8, 2024

bobqianic commented Feb 8, 2024 •

edited

jettoblack commented Feb 8, 2024

bobqianic commented Feb 8, 2024

bobqianic commented Feb 9, 2024

jettoblack commented Feb 10, 2024

bobqianic commented Feb 10, 2024 •

edited

jettoblack commented Feb 12, 2024

ukolovda commented Feb 16, 2024 •

edited

felrock commented Feb 17, 2024

ukolovda commented Feb 19, 2024

ukolovda commented Feb 20, 2024 •

edited

ukolovda commented Feb 20, 2024 •

edited

linmi commented Feb 21, 2024 •

edited

thewh1teagle commented Mar 31, 2024

jwijffels commented Apr 5, 2024 •

edited

ronyfadel commented Apr 30, 2024

bobqianic commented Apr 30, 2024

bygreencn commented May 14, 2024

IntendedConsequence commented May 21, 2024

Fix the decoding issues #1768

Are you sure you want to change the base?

Fix the decoding issues #1768

Conversation

bobqianic commented Jan 14, 2024 • edited

bobqianic commented Feb 6, 2024

felrock left a comment

Choose a reason for hiding this comment

ggerganov commented Feb 8, 2024

bobqianic commented Feb 8, 2024 • edited

jettoblack commented Feb 8, 2024

Example 1:

Command:

Output of master branch @ 434b8f3 (current):

Output of this PR @ c0277e3:

Example 2 with translate fr to en:

Command:

Output of master branch @ 434b8f3 (current):

Output of this PR @ c0277e3:

bobqianic commented Feb 8, 2024

bobqianic commented Feb 9, 2024

Output of this PR @ 476dff4:

jettoblack commented Feb 10, 2024

bobqianic commented Feb 10, 2024 • edited

jettoblack commented Feb 12, 2024

ukolovda commented Feb 16, 2024 • edited

felrock commented Feb 17, 2024

ukolovda commented Feb 19, 2024

ukolovda commented Feb 20, 2024 • edited

ukolovda commented Feb 20, 2024 • edited

linmi commented Feb 21, 2024 • edited

thewh1teagle commented Mar 31, 2024

jwijffels commented Apr 5, 2024 • edited

ronyfadel commented Apr 30, 2024

bobqianic commented Apr 30, 2024

bygreencn commented May 14, 2024

IntendedConsequence commented May 21, 2024

bobqianic commented Jan 14, 2024 •

edited

bobqianic commented Feb 8, 2024 •

edited

Output of master branch @ `434b8f3` (current):

Output of this PR @ `c0277e3`:

Output of master branch @ `434b8f3` (current):

Output of this PR @ `c0277e3`:

Output of this PR @ `476dff4`:

bobqianic commented Feb 10, 2024 •

edited

ukolovda commented Feb 16, 2024 •

edited

ukolovda commented Feb 20, 2024 •

edited

ukolovda commented Feb 20, 2024 •

edited

linmi commented Feb 21, 2024 •

edited

jwijffels commented Apr 5, 2024 •

edited