Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResNet Out Of Memory issue #1

Open
maxkazmsft opened this issue Apr 24, 2019 · 4 comments
Open

ResNet Out Of Memory issue #1

maxkazmsft opened this issue Apr 24, 2019 · 4 comments

Comments

@maxkazmsft
Copy link

In code

resnetscore=resnet.evaluate(np.array([patch_extractor2D(labeled_data,labels["Xline"][x],labels["Time"][x],patch_size,3) for x in test_samples]), keras.utils.to_categorical(labels["Class"][test_samples], num_classes=9))
print(acc_assess(resnetscore,lossfkt,metrica))

I get an OOM with 112GB of RAM:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-57-79d8c18bca15> in <module>
----> 1 resnetscore=resnet.evaluate(np.array([patch_extractor2D(labeled_data,labels["Xline"][x],labels["Time"][x],patch_size,3) for x in test_samples]), keras.utils.to_categorical(labels["Class"][test_samples], num_classes=9))
      2 print(acc_assess(resnetscore,lossfkt,metrica))

MemoryError: 

Has anyone else had this problem?

@JesperDramsch
Copy link
Owner

What's the VRAM of your graphics card?

@artizzq
Copy link

artizzq commented Jan 18, 2021

Hello. This issue still persists. I have 6 GB VRAM and 16 GB RAM with 40 GB paging file. Windows 10 and Python are 64-bit.

@artizzq
Copy link

artizzq commented Jan 19, 2021

Feels like I solved the issue.
Firstly, I changed in code 'patch_size' to
patch_size = 16
Before that I increased my paging file size in Windows settings.
Then I changed this line
vanillascore=model_vanilla.evaluate(np.expand_dims(np.array([patch_extractor2D(labeled_data,labels["Xline"][x],labels["Time"][x],64) for x in test_samples]), axis=3),keras.utils.to_categorical(labels["Class"][test_samples], num_classes=9), verbose=0)
to
vanillascore=model_vanilla.evaluate(np.expand_dims(np.array([patch_extractor2D(labeled_data,labels["Xline"][x],labels["Time"][x],16) for x in test_samples]), axis=3),keras.utils.to_categorical(labels["Class"][test_samples], num_classes=9), verbose=0)

Network started to train and everything went good until I got another issue. After executing the following lines below, it seems to me that the execution is turned into an infinite loop.

for space in tqdm(range(y_max),desc='Space'):
    for depth in tqdm(range(t_max),leave=False, desc='Time'):
        predx[depth,space] = np.argmax(model_vanilla.predict(np.expand_dims(np.expand_dims(patch_extractor2D(xline_data,space,depth,patch_size), axis=0), axis=3)))

This is what I have in the output, the process is running and is not interrupted.

WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
2021-01-19 10:47:15.171533: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-19 10:47:15.172115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.56GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 245.91GiB/s
2021-01-19 10:47:15.172384: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-19 10:47:15.172559: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-19 10:47:15.172697: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-19 10:47:15.172836: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-19 10:47:15.172969: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-19 10:47:15.173106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-19 10:47:15.173243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-19 10:47:15.173383: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-19 10:47:15.173550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-19 10:47:15.174986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.56GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 245.91GiB/s
2021-01-19 10:47:15.175233: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-01-19 10:47:15.175359: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-19 10:47:15.175483: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-19 10:47:15.175634: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-01-19 10:47:15.176146: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-01-19 10:47:15.176488: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-01-19 10:47:15.176805: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-01-19 10:47:15.177081: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-19 10:47:15.177419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-01-19 10:47:15.177576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-19 10:47:15.177998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-01-19 10:47:15.178187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-01-19 10:47:15.178512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4720 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-01-19 10:47:15.178848: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Space:   0%|          | 0/651 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
2021-01-19 10:47:15.544862: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-01-19 10:47:15.754134: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-19 10:47:16.932268: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-19 10:47:16.945345: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-19 10:47:20.012456: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2021-01-19 10:47:20.050926: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]
Time:   0%|          | 0/462 [00:00<?, ?it/s]

As you can see, this line does not change its state, but only prints over and over again.

Maybe that's the way it should be? I don't know.

@artizzq
Copy link

artizzq commented Jan 20, 2021

I've just opened a new issue on this so all discussions probably would turn there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants