Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb #6

Open
snavavf opened this issue May 5, 2018 · 4 comments
Open

Can not reproduce results of Uncertainty_Demo_MNIST.ipynb #6

snavavf opened this issue May 5, 2018 · 4 comments
Assignees
Labels

Comments

@snavavf
Copy link

snavavf commented May 5, 2018

Hi, thanks for sharing these great implementation on github! Nice work.

I ran your notebook Uncertainty_Demo_MNIST.ipynb.
However I can not get the same results as it showed in the notebook output. The loss I got are all nan.

Could you suggest why?

The output I got from the second cell (Train the neural network on MNIST training set):

Number of Training Data: 54000, Number of Validation Data: 6000
====Message from Normalizer====
You selected mode: 255
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
====Message from Normalizer====
You selected mode: 0
Featurewise Center: False
Datawise Center: False
Featurewise std Center: False
Datawise std Center: False
====Message ends====
Sorry but there is a known issue of the loss not handling loss correctly. I will fix it in May-- Henry 19 April 2018
Epoch 1/5
 - 163s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0980 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0991
Epoch 2/5
 - 159s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0987 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1047

Epoch 00002: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 3/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.1001 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.0971

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 4/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0967 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1008

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.
Epoch 5/5
 - 157s - loss: nan - output_loss: nan - variance_output_loss: nan - output_categorical_accuracy: 0.0998 - val_loss: nan - val_output_loss: nan - val_variance_output_loss: nan - val_output_categorical_accuracy: 0.1003

Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0003124999930150807.
Completed Training, 794.97s in total

Thanks!

@henrysky henrysky self-assigned this May 5, 2018
@henrysky henrysky added the bug label May 5, 2018
henrysky added a commit that referenced this issue May 5, 2018
henrysky added a commit that referenced this issue May 5, 2018
@henrysky
Copy link
Owner

henrysky commented May 5, 2018

Hi, the main issue is astroNN built-in data normalizer ignored mode=255 due to this faulty commit f8fb024 lead to the normalizer does nothing to normalize MNIST images and blow up the gradient. I am kinda still on holiday and will go back to research work on the coming Monday so the bug will be fully patched next week probably.

But I have updated some codes in the latest commit and there are some workarounds need to be done in your Jupyter Notebook as I do not want to modify the notebook yet.

  1. add line net.mc_num = 25 after net = MNIST_BCNN() due to a performance issue, so do less Monte Carlo passes as a workaround
  2. change pred, pred_std = net.test(x_test[test_idx]) topred, pred_std = net.test_old(x_test[test_idx]) due to the test() refers to the new fast MC inference on GPU now which turns out not handling classification task correctly and the old test() is renamed to test_old()
  3. change pred_rot, pred_rot_std = net.test(test_rot) topred_rot, pred_rot_std = net.test_old(test_rot) for the same reason

This issue will remain open until the issue is fully resolved

To-do list for me:

  • Add test cases to prevent similar issues (check Nan especially) Done!!
  • The losses now have some kind of performance issue (Painfully slow even on GPU, definitely some operation(s) are being ran on CPU for some reasons) 50% Done!!
  • The new accelerated test() for BNN is not handling classification task correctly (and add test case!!!) Done!!

@henrysky
Copy link
Owner

It should have fully resolved, no modification in the Uncertainty_Demo_MNIST.ipynb is needed

@snavavf
Copy link
Author

snavavf commented May 17, 2018

Thanks for the quick update!
Now I can get reasonable loss from the second cell. Great.

However, in the third cell (Test the neural network on random MNIST images),
the total uncertainty (entropy) I got are all 1.0.

As in the following link
https://i.imgur.com/VaVfdsb.jpg

Could you suggest why?
Thanks!

@henrysky
Copy link
Owner

I acknowledge the issue.

My apology, I use regression only for my research so classification-related things are not tested regularly, the current continuous integration test cases only make sure things run without error but not reasonable result. I am looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants