Use Atari-5 for future benchmarking of discrete RL #1110

nuance1979 · 2024-04-12T17:58:10Z

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- design request (i.e. "X should be changed to Y.")
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I came across the Atari-5 paper, which convinced me that Atari-5 (Battle Zone, Double Dunk, Name This Game, Phoenix, Q*Bert) is a better subset of ALE games in terms of benchmarking efficiency than our current selection, which is approximately DQN-7 (Beam Rider, Breakout, Enduro, Pong, Q*Bert, Seaquest, S. Invaders). We can keep Pong as a sanity check so my proposal becomes Atari-5 + Pong.

TL; DR of the paper: by evaluating an algorithm on Atari-5, you get a fairly close estimation of its performance on the whole 57 ALE games.

I haven't tried it with Tianshou myself so not sure if there will be any unforeseen issue. Just want to throw it out there and seek your feedback.

MischaPanch · 2024-04-15T09:30:06Z

I think it's a very good idea, thanks for bringing it up! One of the important issues in the next two months will be establishing reliable regression tests for all algos and run them on each release. Testing on the Atari-5 will likely be incorporated into that

MischaPanch added build/test discussion Discussion of a typical issue labels Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Atari-5 for future benchmarking of discrete RL #1110

Use Atari-5 for future benchmarking of discrete RL #1110

nuance1979 commented Apr 12, 2024 •

edited

MischaPanch commented Apr 15, 2024

Use Atari-5 for future benchmarking of discrete RL #1110

Use Atari-5 for future benchmarking of discrete RL #1110

Comments

nuance1979 commented Apr 12, 2024 • edited

MischaPanch commented Apr 15, 2024

nuance1979 commented Apr 12, 2024 •

edited