excessive memory usage and crashes with long duration files #70

realies · 2022-06-03T22:25:54Z

Trying to process files that are around 1 hour long makes inaSpeechSegmenter want to use about 4.5GB of system memory. If that can't be allocated it crashes with this output and code 137:

2022-06-03 22:16:07.947395: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
2022-06-03 22:16:07.947445: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (1fcd6b8c8160): /proc/driver/nvidia/version does not exist
2022-06-03 22:16:07.947619: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

This happens when inaSpeechSegmenter is used with:

    docker run --rm \
    -v /samples:/inaSpeechSegmenter/samples \
    inafoss/inaspeechsegmenter \
    ina_speech_segmenter.py -g false -i samples/1h_file.wav -o samples

Limiting the Docker container to a fixed amount of memory (e.g. --memory=1g) makes it crash with the same error message as above. Can the memory footprint be controlled in any way?

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): latest official inafoss/inaspeechsegmenter Docker image
TensorFlow version: x
Python version: x
Running on GPU or CPU: CPU
output of command nvidia-smi (if GPU is used)
CUDA/cuDNN version (if GPU is used): x
Using Docker: yes

Expected Behavior

Use as much memory as available, and slow down the process if can't allocate everything.

Current Behavior

Crash

The text was updated successfully, but these errors were encountered:

DavidDoukhan · 2022-06-05T10:39:34Z

Dear @realies ,

Thanks you for your message.

First of all, according to the the error message you provided, it seems your problem is not only related to the amount of RAM, but to the inability of the docker image to initialize cuda. This may happen when the nvidia-drivers are not compatible with the tensorflow version used.

Could you provide me the output of command nvidia-smi : within inaspeechsegmenter's container, and outside the container ?

Could you also provide me the whole output of your docker command for a 1h long file and for a shorter file ?

As you noticed, the current design of inaSpeechSegmenter requires a large amount of RAM, which is dependent on the duration of the file being processed.

Up to now, these requirements haven't been a problem for my use-cases, as long as the hardware has enough RAM : in my point of view, 4.5 Gb is not a large amount of RAM when used on laptops or GPU servers.

I guess a RAM-friendly implementation of inaSpeechSegmenter could be done using keras data generator structures.
This would require a substantial refactoring of the whole code base that may require an involvement that I estimate between 1 day (optimistic - senior developper) and 1 week of code (pessimistic - a junior developper).

While being useful, I currently don't have the time to do this improvement.

If this feature is necessary for your use-case, and you're willing to contribute to the improvement of inaSpeechSegmenter's code-base, please let me know and we may plan a meeting to discuss about these issues.

Kind regards,

realies · 2022-06-06T14:44:17Z

Thanks for the prompt reply, @DavidDoukhan!

I think the error message is unrelated to the issue and is expected, as this Docker container runs without a GPU, and the machine it runs on does not have one.

Could you also provide me the whole output of your docker command for a 1h long file and for a shorter file ?

Here's the output of segmenting a 1 minute file vs 1 hour file when the container is limited to 1GB of system memory:

$ docker run --rm --memory=1g \
    -v $(pwd):/inaSpeechSegmenter/samples \
    inafoss/inaspeechsegmenter \
    ina_speech_segmenter.py -g false -i samples/1m_file.wav -o samples
2022-06-06 14:28:23.065325: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
2022-06-06 14:28:23.065403: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (eda64b61feb1): /proc/driver/nvidia/version does not exist
2022-06-06 14:28:23.065786: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
batch_processing 1 files
1/1 [('samples/1m_file.csv', 0, 'ok')]
$ echo $?
0

$ docker run --rm --memory=1g \
    -v $(pwd):/inaSpeechSegmenter/samples \
    inafoss/inaspeechsegmenter \
    ina_speech_segmenter.py -g false -i samples/1h_file.wav -o samples
2022-06-06 14:32:10.537857: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
2022-06-06 14:32:10.537929: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (2b6e398934bd): /proc/driver/nvidia/version does not exist
2022-06-06 14:32:10.538459: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
$ echo $?
137

Up to now, these requirements haven't been a problem for my use-cases, as long as the hardware has enough RAM : in my point of view, 4.5 Gb is not a large amount of RAM when used on laptops or GPU servers.

Trying to use the library on a non-GPU system, and those are usually available with less ram than GPU systems. I think it would be a good optimisation to lower the memory footprint of inaSpeechSegmenter.

I guess a RAM-friendly implementation of inaSpeechSegmenter could be done using keras data generator structures.
This would require a substantial refactoring of the whole code base that may require an involvement that I estimate between 1 day (optimistic - senior developper) and 1 week of code (pessimistic - a junior developper).

While being useful, I currently don't have the time to do this improvement.

If this feature is necessary for your use-case, and you're willing to contribute to the improvement of inaSpeechSegmenter's code-base, please let me know and we may plan a meeting to discuss about these issues.

Appreciate the suggestion on using Keras data generator structures. I have very little experience in writing neural networks outside of Matlab and was wondering if you might be interested in doing this refactoring for a fee, or helping with guiding through the high-level changes that need to happen for this to work (might require extra noob patience).

realies · 2022-06-06T16:20:49Z

Update: using 1h_file.wav as input file made inaSpeechSegmenter peak at around 7GB at the beginning before stabilising at around 4.5G for most of the time it runs.

lovegaoshi · 2023-01-09T19:23:14Z

One quick and dirty solution is to probe the length of your media file with ffprobe (which is in the ffmpeg package), then segment it into shorter chunks and do inasegmenter piece by piece with start_sec and stop_sec of Segmenter.call, and finally join the result arrays together. Make sure to import the gc module and do a garbage collection after each segmenter run.

From my experience, 10 minutes chunks are perfectly fine for a 1GB RAM, 5GB swap docker container; the swap size is very much exaggerated because i really dont want to have a signal 9. Segmentation quality is assumed to be worse but not critical to my use screnario. See my Segmenter.call's wrapper at
https://github.com/lovegaoshi/ipynb/blob/inaseg-cloud/inaseg.py#L48

The same idea applies to limited VRAM scenarios. Although I have to warn about processing speeds using limited hardware resources, an oracle 1vcore 1GB RAM is at 230ms/step; GTX1070 is at 2ms/step by comparison.

realies · 2023-01-09T19:27:36Z

Using a rolling window for analysis instead of buffering the whole file into RAM/VRAM would be much better than workarounds that make the library more inaccurate and complicated to use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

excessive memory usage and crashes with long duration files #70

excessive memory usage and crashes with long duration files #70

realies commented Jun 3, 2022

DavidDoukhan commented Jun 5, 2022 •

edited

realies commented Jun 6, 2022

realies commented Jun 6, 2022

lovegaoshi commented Jan 9, 2023

realies commented Jan 9, 2023 •

edited

excessive memory usage and crashes with long duration files #70

excessive memory usage and crashes with long duration files #70

Comments

realies commented Jun 3, 2022

System information

Expected Behavior

Current Behavior

DavidDoukhan commented Jun 5, 2022 • edited

realies commented Jun 6, 2022

realies commented Jun 6, 2022

lovegaoshi commented Jan 9, 2023

realies commented Jan 9, 2023 • edited

DavidDoukhan commented Jun 5, 2022 •

edited

realies commented Jan 9, 2023 •

edited