Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some network architectures broken in 0.28 with NaNs in the calculations #1620

Open
rooklift opened this issue Aug 5, 2021 · 10 comments
Open

Comments

@rooklift
Copy link
Contributor

rooklift commented Aug 5, 2021

This is the Tiny Gyal 8 network in dx12 0.28-rc1

image

Even though the net is weak, it's not this weak...

@rooklift
Copy link
Contributor Author

rooklift commented Aug 5, 2021

It looks like you might have to actually play out the moves in order to get the bad behaviour, i.e. if you simply start with this position its output looks reasonable, but if you actually run the engine while playing the moves, things get dodgy.

@rooklift
Copy link
Contributor Author

rooklift commented Aug 5, 2021

Example repro steps (tested in dx12, edit the path to your copy of the weights)...

uci
setoption name WeightsFile value C:\Users\Owner\Documents\Misc\Chess\Lc0_Networks\tinygyal-8.pb.gz
setoption name VerboseMoveStats value true
ucinewgame
position startpos
go nodes 1000000

    <wait for bestmove>

position startpos moves e2e4
go nodes 1000000

Output:

info string g7g5  (378 ) N:     291 (+ 0) (P:  0.71%) (WL: -0.25873) (D: 0.000) (M:  3.8) (Q: -0.25873) (U: 0.26795) (S:  0.00922) (V:  -.----)
info string b7b5  (234 ) N:     382 (+ 0) (P:  0.39%) (WL: -0.11978) (D: 0.000) (M:  4.4) (Q: -0.11978) (U: 0.11256) (S: -0.00722) (V:  -.----)
info string f7f6  (346 ) N:     404 (+ 0) (P:  0.72%) (WL: -0.19686) (D: 0.000) (M:  4.1) (Q: -0.19686) (U: 0.19810) (S:  0.00125) (V:  -.----)
info string g8h6  (161 ) N:     567 (+ 0) (P:  0.71%) (WL: -0.14161) (D: 0.000) (M:  4.1) (Q: -0.14161) (U: 0.13831) (S: -0.00330) (V:  -.----)
info string b8a6  (34  ) N:     591 (+ 0) (P:  0.73%) (WL: -0.14055) (D: 0.000) (M:  4.7) (Q: -0.14055) (U: 0.13631) (S: -0.00424) (V:  -.----)
info string e7e5  (322 ) N:     648 (+ 0) (P: 20.42%) (WL:      nan) (D: 0.000) (M:  5.1) (Q:      nan) (U: 3.49159) (S:      nan) (V:  -.----)
info string a7a5  (207 ) N:     864 (+ 0) (P:  0.65%) (WL: -0.09471) (D: 0.000) (M:  5.2) (Q: -0.09471) (U: 0.08390) (S: -0.01082) (V:  -.----)
info string f7f5  (351 ) N:    1000 (+ 0) (P:  0.57%) (WL: -0.07996) (D: 0.000) (M:  5.1) (Q: -0.07996) (U: 0.06328) (S: -0.01668) (V:  -.----)
info string h7h6  (400 ) N:    1762 (+ 0) (P:  2.34%) (WL: -0.14931) (D: 0.000) (M:  5.2) (Q: -0.14931) (U: 0.14703) (S: -0.00228) (V:  -.----)
info string b7b6  (230 ) N:    2549 (+ 0) (P:  0.98%) (WL: -0.06257) (D: 0.000) (M:  5.3) (Q: -0.06257) (U: 0.04273) (S: -0.01984) (V:  -.----)
info string b8c6  (36  ) N:    2636 (+ 0) (P:  3.58%) (WL: -0.15253) (D: 0.000) (M:  6.3) (Q: -0.15253) (U: 0.15050) (S: -0.00203) (V:  -.----)
info string a7a6  (204 ) N:    3325 (+ 0) (P:  3.28%) (WL: -0.11613) (D: 0.000) (M:  5.4) (Q: -0.11613) (U: 0.10955) (S: -0.00658) (V:  -.----)
info string g7g6  (374 ) N:    3685 (+ 0) (P:  1.31%) (WL: -0.06062) (D: 0.000) (M:  5.9) (Q: -0.06062) (U: 0.03928) (S: -0.02133) (V:  -.----)
info string g8f6  (159 ) N:    4990 (+ 0) (P:  1.13%) (WL: -0.05032) (D: 0.000) (M:  6.8) (Q: -0.05032) (U: 0.02517) (S: -0.02515) (V:  -.----)
info string c7c6  (259 ) N:    6088 (+ 0) (P:  5.68%) (WL: -0.11082) (D: 0.000) (M:  7.2) (Q: -0.11082) (U: 0.10355) (S: -0.00727) (V:  -.----)
info string d7d6  (288 ) N:    7882 (+ 0) (P:  7.59%) (WL: -0.11371) (D: 0.000) (M:  6.2) (Q: -0.11371) (U: 0.10687) (S: -0.00683) (V:  -.----)
info string e7e6  (317 ) N:   12690 (+ 0) (P: 21.54%) (WL: -0.18663) (D: 0.000) (M:  7.3) (Q: -0.18663) (U: 0.18832) (S:  0.00169) (V:  -.----)
info string d7d5  (293 ) N:   90376 (+ 0) (P: 10.35%) (WL: -0.04425) (D: 0.001) (M:  8.3) (Q: -0.04425) (U: 0.01270) (S: -0.03155) (V:  -.----)
info string c7c5  (264 ) N:  170214 (+ 0) (P: 16.54%) (WL: -0.04273) (D: 0.000) (M:  9.2) (Q: -0.04273) (U: 0.01078) (S: -0.03195) (V:  -.----)
info string h7h5  (403 ) N:  392984 (+205) (P:  0.78%) (WL: -0.01160) (D: 0.001) (M:  9.2) (Q: -0.01160) (U: 0.00022) (S: -0.01138) (V:  -.----)
info string node  (  20) N:  703929 (+205) (P: 99.99%) (WL:     -nan) (D: 0.001) (M:  9.9) (Q:     -nan) (V:  -.----)
bestmove h7h5 ponder f1d3

Ofc h7h5 is terrible, also what are these NaNs doing?

@rooklift rooklift changed the title Some network architectures broken in 0.28? Some network architectures broken in 0.28 with NaNs in the calculations Aug 5, 2021
@rooklift
Copy link
Contributor Author

rooklift commented Aug 5, 2021

Hmm with Cuda by contrast, I don't get NaNs in the output and the results seem marginally better but still weird compared to 0.27.

Edit: Actually I think cuda-fp16 shows broken results while cuda (32) seems maybe OK.

@rooklift
Copy link
Contributor Author

rooklift commented Aug 5, 2021

Comparing cuda with cuda-fp16 on startpos (again, still Tiny Gyal 8):

cuda:        e4    P = 42.54%
cuda-fp16:   e4    P = 9.92%

@borg323
Copy link
Member

borg323 commented Aug 24, 2021

Is this still an issue with rc2?

@rooklift
Copy link
Contributor Author

Yeah, for example I got this with 0.28-rc2 cuda:

image

@mooskagh
Copy link
Member

Still an issue?

@rooklift
Copy link
Contributor Author

rooklift commented Oct 28, 2021

Uh, I just downloaded a recent appveyor build for dx12, definitely some weirdness still happening with tinygyal, for instance:

image

With lines like:

info depth 10 seldepth 20 time 2366 nodes 429846 score cp -2147483648 wdl 0 1000 0 nps 74184 tbhits 0 multipv 2 pv d1h5 f8b4 b1c3 d8h4 f1b5 h4g3 h2g3 c7c6 b5a4 g8e7 h5e5 b4c3 e5c3 b7b5 g1f3 b5a4 b2b3 d7d5 b3a4

info string d1h5  (93  ) N:   75512 (+ 0) (P:  0.56%) (WL:      nan) (D: 0.000) (M: 10.1) (Q:      nan) (U: 0.00055) (S:      nan) (V:  -.----)

This was without the --backend-opts=enable-gemm-metacommand=false workaround though.

@borg323
Copy link
Member

borg323 commented Oct 28, 2021

Is the workaround effective?

@rooklift
Copy link
Contributor Author

For dx12, yes, I think the workaround works.

For CUDA, I think everything is fine now even without it? As far as I can see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants