Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add policy mix option to value-only mode. #2004

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Tilps
Copy link
Contributor

@Tilps Tilps commented Mar 29, 2024

Bit of a hack. If we decide this is useful I should also add it to value_tournament, clean it up so it does literally nothing when mix is 0.

Running a tune with T82 network - looks like optimum is to have very low policy temperature and a moderate policy mix.
Tune probably doesn't have right range for PST - but current minima (after 6000 games) is 0.325 (policy mix) and 0.26 (policy temperature). Elo gain vs pure value mode is estimated as ~40 elo.
As the policy temperature goes down, so does the optimal policy mix. At policy temperature 1 the optimal policy mix seemed closer to 0.5. (Which was also probably >30 elo)

@Tilps
Copy link
Contributor Author

Tilps commented Mar 29, 2024

New optima at 9000 games - {'PolicyMix': 0.14689769678592882, 'PolicyTemperature': 0.2}

@Tilps
Copy link
Contributor Author

Tilps commented Mar 29, 2024

15000 games {'PolicyMix': 0.4831467071587293, 'PolicyTemperature': 0.2} (Elo estimate is at 49)

seems like I'll need to do another run expanding policy temperature even further...

@Tilps
Copy link
Contributor Author

Tilps commented Mar 29, 2024

18000 games - {'PolicyMix': 0.5291014358428309, 'PolicyTemperature': 0.2} (elo estimate 47)

Going to do the restart with double number of rounds overnight.

@Tilps
Copy link
Contributor Author

Tilps commented Mar 29, 2024

restarting with wider policy temperature (and removing the clearly bad negative policy mix values) oddly decided to go find a minima with a completely different policy temperature..
{'PolicyMix': 0.4585496867191581, 'PolicyTemperature': 1.222335248497085} (elo estimate of 45 after 36000 tuning games)

so seems like the theory that policy mix reduces as temperature reduces is invalid... Policy temperature just seems to have a relatively small effect maybe...

(There was an early minima at temperature 0.05 with PolicyMix of 0.35 - but it never stabilized there seems rare after the first 18000 tuning games.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant