Can not reproduce results of Uncertainty_Demo_MNIST.ipynb #6

snavavf · 2018-05-05T04:22:09Z

Hi, thanks for sharing these great implementation on github! Nice work.

I ran your notebook Uncertainty_Demo_MNIST.ipynb.
However I can not get the same results as it showed in the notebook output. The loss I got are all nan.

Could you suggest why?

The output I got from the second cell (Train the neural network on MNIST training set):

Number of Training Data: 54000, Number of Validation Data: 6000
====Message from Normalizer====
You selected mode: 255
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
====Message from Normalizer====
You selected mode: 0
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
Sorry but there is a known issue of the loss not handling loss correctly. I will fix it in May-- Henry 19 April 2018
Epoch 1/5
 - 163s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0980 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0991
Epoch 2/5
 - 159s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0987 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1047

Epoch 00002: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 3/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.1001 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0971

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 4/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0967 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1008

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.
Epoch 5/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0998 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1003

Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.
Completed Training, 794.97s in total

Thanks!

The text was updated successfully, but these errors were encountered:

henrysky · 2018-05-05T06:11:07Z

Hi, the main issue is astroNN built-in data normalizer ignored mode=255 due to this faulty commit f8fb024 lead to the normalizer does nothing to normalize MNIST images and blow up the gradient. I am kinda still on holiday and will go back to research work on the coming Monday so the bug will be fully patched next week probably.

But I have updated some codes in the latest commit and there are some workarounds need to be done in your Jupyter Notebook as I do not want to modify the notebook yet.

~~add line net.mc_num = 25 after net = MNIST_BCNN() due to a performance issue, so do less Monte Carlo passes as a workaround~~
change pred, pred_std = net.test(x_test[test_idx]) topred, pred_std = net.test_old(x_test[test_idx]) due to the test() refers to the new fast MC inference on GPU now which turns out not handling classification task correctly and the old test() is renamed to test_old()
~~change pred_rot, pred_rot_std = net.test(test_rot) topred_rot, pred_rot_std = net.test_old(test_rot) for the same reason~~

This issue will remain open until the issue is fully resolved

To-do list for me:

~~Add test cases to prevent similar issues (check Nan especially)~~ Done!!
~~The losses now have some kind of performance issue (Painfully slow even on GPU, definitely some operation(s) are being ran on CPU for some reasons)~~ 50% Done!!
~~The new accelerated test() for BNN is not handling classification task correctly (and add test case!!!)~~ Done!!

henrysky · 2018-05-14T02:20:06Z

It should have fully resolved, no modification in the Uncertainty_Demo_MNIST.ipynb is needed

snavavf · 2018-05-17T02:21:51Z

Thanks for the quick update!
Now I can get reasonable loss from the second cell. Great.

However, in the third cell (Test the neural network on random MNIST images),
the total uncertainty (entropy) I got are all 1.0.

As in the following link
https://i.imgur.com/VaVfdsb.jpg

Could you suggest why?
Thanks!

henrysky · 2018-05-17T13:42:42Z

I acknowledge the issue.

My apology, I use regression only for my research so classification-related things are not tested regularly, the current continuous integration test cases only make sure things run without error but not reasonable result. I am looking into it.

henrysky self-assigned this May 5, 2018

henrysky added the bug label May 5, 2018

henrysky added a commit that referenced this issue May 5, 2018

#6 issue

6ab2d85

henrysky added a commit that referenced this issue May 5, 2018

CI test failed #6

b72a285

henrysky added a commit that referenced this issue May 8, 2018

check Nan in almost all neuralnet test case bc of #6

7e6e5d9

henrysky referenced this issue May 9, 2018

type checkoing update

8ccf229

henrysky closed this as completed May 14, 2018

henrysky reopened this May 17, 2018

luantunez mentioned this issue Apr 14, 2021

ApogeeBCNN() dimensions #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb #6

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb #6

snavavf commented May 5, 2018 •

edited

henrysky commented May 5, 2018 •

edited

henrysky commented May 14, 2018

snavavf commented May 17, 2018

henrysky commented May 17, 2018

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb #6

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb #6

Comments

snavavf commented May 5, 2018 • edited

henrysky commented May 5, 2018 • edited

henrysky commented May 14, 2018

snavavf commented May 17, 2018

henrysky commented May 17, 2018

snavavf commented May 5, 2018 •

edited

henrysky commented May 5, 2018 •

edited