ResNet Out Of Memory issue #1

maxkazmsft · 2019-04-24T15:33:05Z

In code

resnetscore=resnet.evaluate(np.array([patch_extractor2D(labeled_data,labels["Xline"][x],labels["Time"][x],patch_size,3) for x in test_samples]), keras.utils.to_categorical(labels["Class"][test_samples], num_classes=9))
print(acc_assess(resnetscore,lossfkt,metrica))

I get an OOM with 112GB of RAM:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-57-79d8c18bca15> in <module>
----> 1 resnetscore=resnet.evaluate(np.array([patch_extractor2D(labeled_data,labels["Xline"][x],labels["Time"][x],patch_size,3) for x in test_samples]), keras.utils.to_categorical(labels["Class"][test_samples], num_classes=9))
      2 print(acc_assess(resnetscore,lossfkt,metrica))

MemoryError:

Has anyone else had this problem?

The text was updated successfully, but these errors were encountered:

JesperDramsch · 2020-08-21T09:22:26Z

What's the VRAM of your graphics card?

artizzq · 2021-01-18T15:13:18Z

Hello. This issue still persists. I have 6 GB VRAM and 16 GB RAM with 40 GB paging file. Windows 10 and Python are 64-bit.

artizzq · 2021-01-19T05:53:40Z

Feels like I solved the issue.
Firstly, I changed in code 'patch_size' to
patch_size = 16
Before that I increased my paging file size in Windows settings.
Then I changed this line
vanillascore=model_vanilla.evaluate(np.expand_dims(np.array([patch_extractor2D(labeled_data,labels["Xline"][x],labels["Time"][x],64) for x in test_samples]), axis=3),keras.utils.to_categorical(labels["Class"][test_samples], num_classes=9), verbose=0)
to
vanillascore=model_vanilla.evaluate(np.expand_dims(np.array([patch_extractor2D(labeled_data,labels["Xline"][x],labels["Time"][x],16) for x in test_samples]), axis=3),keras.utils.to_categorical(labels["Class"][test_samples], num_classes=9), verbose=0)

Network started to train and everything went good until I got another issue. After executing the following lines below, it seems to me that the execution is turned into an infinite loop.

for space in tqdm(range(y_max),desc='Space'):
    for depth in tqdm(range(t_max),leave=False, desc='Time'):
        predx[depth,space] = np.argmax(model_vanilla.predict(np.expand_dims(np.expand_dims(patch_extractor2D(xline_data,space,depth,patch_size), axis=0), axis=3)))

This is what I have in the output, the process is running and is not interrupted.

WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
2021-01-19 10:47:15.171533: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-19 10:47:15.172115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.56GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 245.91GiB/s
2021-01-19 10:47:15.172384: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-19 10:47:15.172559: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-19 10:47:15.172697: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-19 10:47:15.172836: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-19 10:47:15.172969: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-19 10:47:15.173106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-19 10:47:15.173243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-19 10:47:15.173383: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-19 10:47:15.173550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-19 10:47:15.174986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.56GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 245.91GiB/s
2021-01-19 10:47:15.175233: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-19 10:47:15.175359: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-19 10:47:15.175483: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-19 10:47:15.175634: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-19 10:47:15.176146: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-19 10:47:15.176488: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-19 10:47:15.176805: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-19 10:47:15.177081: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-19 10:47:15.177419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-19 10:47:15.177576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-19 10:47:15.177998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-01-19 10:47:15.178187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-01-19 10:47:15.178512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4720 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-19 10:47:15.178848: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Space:   0%|          | 0/651 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
2021-01-19 10:47:15.544862: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-01-19 10:47:15.754134: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-19 10:47:16.932268: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-19 10:47:16.945345: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-19 10:47:20.012456: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2021-01-19 10:47:20.050926: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]

As you can see, this line does not change its state, but only prints over and over again.

Maybe that's the way it should be? I don't know.

artizzq · 2021-01-20T04:41:57Z

I've just opened a new issue on this so all discussions probably would turn there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResNet Out Of Memory issue #1

ResNet Out Of Memory issue #1

maxkazmsft commented Apr 24, 2019

JesperDramsch commented Aug 21, 2020

artizzq commented Jan 18, 2021 •

edited

artizzq commented Jan 19, 2021 •

edited

artizzq commented Jan 20, 2021

ResNet Out Of Memory issue #1

ResNet Out Of Memory issue #1

Comments

maxkazmsft commented Apr 24, 2019

JesperDramsch commented Aug 21, 2020

artizzq commented Jan 18, 2021 • edited

artizzq commented Jan 19, 2021 • edited

artizzq commented Jan 20, 2021

artizzq commented Jan 18, 2021 •

edited

artizzq commented Jan 19, 2021 •

edited