You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I came across the Atari-5 paper, which convinced me that Atari-5 (Battle Zone, Double Dunk, Name This Game, Phoenix, Q*Bert) is a better subset of ALE games in terms of benchmarking efficiency than our current selection, which is approximately DQN-7 (Beam Rider, Breakout, Enduro, Pong, Q*Bert, Seaquest, S. Invaders). We can keep Pong as a sanity check so my proposal becomes Atari-5 + Pong.
TL; DR of the paper: by evaluating an algorithm on Atari-5, you get a fairly close estimation of its performance on the whole 57 ALE games.
I haven't tried it with Tianshou myself so not sure if there will be any unforeseen issue. Just want to throw it out there and seek your feedback.
The text was updated successfully, but these errors were encountered:
I think it's a very good idea, thanks for bringing it up! One of the important issues in the next two months will be establishing reliable regression tests for all algos and run them on each release. Testing on the Atari-5 will likely be incorporated into that
I came across the Atari-5 paper, which convinced me that Atari-5 (Battle Zone, Double Dunk, Name This Game, Phoenix, Q*Bert) is a better subset of ALE games in terms of benchmarking efficiency than our current selection, which is approximately DQN-7 (Beam Rider, Breakout, Enduro, Pong, Q*Bert, Seaquest, S. Invaders). We can keep Pong as a sanity check so my proposal becomes Atari-5 + Pong.
TL; DR of the paper: by evaluating an algorithm on Atari-5, you get a fairly close estimation of its performance on the whole 57 ALE games.
I haven't tried it with Tianshou myself so not sure if there will be any unforeseen issue. Just want to throw it out there and seek your feedback.
The text was updated successfully, but these errors were encountered: