-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some network architectures broken in 0.28 with NaNs in the calculations #1620
Comments
It looks like you might have to actually play out the moves in order to get the bad behaviour, i.e. if you simply start with this position its output looks reasonable, but if you actually run the engine while playing the moves, things get dodgy. |
Example repro steps (tested in dx12, edit the path to your copy of the weights)...
Output:
Ofc h7h5 is terrible, also what are these NaNs doing? |
Hmm with Cuda by contrast, I don't get NaNs in the output and the results seem marginally better but still weird compared to 0.27. Edit: Actually I think cuda-fp16 shows broken results while cuda (32) seems maybe OK. |
Comparing
|
Is this still an issue with rc2? |
Still an issue? |
Uh, I just downloaded a recent appveyor build for dx12, definitely some weirdness still happening with tinygyal, for instance: With lines like:
This was without the |
Is the workaround effective? |
For dx12, yes, I think the workaround works. For CUDA, I think everything is fine now even without it? As far as I can see. |
This is the Tiny Gyal 8 network in dx12 0.28-rc1
Even though the net is weak, it's not this weak...
The text was updated successfully, but these errors were encountered: