Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Atari-5 for future benchmarking of discrete RL #1110

Open
4 of 9 tasks
nuance1979 opened this issue Apr 12, 2024 · 1 comment
Open
4 of 9 tasks

Use Atari-5 for future benchmarking of discrete RL #1110

nuance1979 opened this issue Apr 12, 2024 · 1 comment
Labels
build/test discussion Discussion of a typical issue

Comments

@nuance1979
Copy link
Collaborator

nuance1979 commented Apr 12, 2024

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
    • design request (i.e. "X should be changed to Y.")
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gymnasium as gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

I came across the Atari-5 paper, which convinced me that Atari-5 (Battle Zone, Double Dunk, Name This Game, Phoenix, Q*Bert) is a better subset of ALE games in terms of benchmarking efficiency than our current selection, which is approximately DQN-7 (Beam Rider, Breakout, Enduro, Pong, Q*Bert, Seaquest, S. Invaders). We can keep Pong as a sanity check so my proposal becomes Atari-5 + Pong.

TL; DR of the paper: by evaluating an algorithm on Atari-5, you get a fairly close estimation of its performance on the whole 57 ALE games.

I haven't tried it with Tianshou myself so not sure if there will be any unforeseen issue. Just want to throw it out there and seek your feedback.

@MischaPanch
Copy link
Collaborator

I think it's a very good idea, thanks for bringing it up! One of the important issues in the next two months will be establishing reliable regression tests for all algos and run them on each release. Testing on the Atari-5 will likely be incorporated into that

@MischaPanch MischaPanch added build/test discussion Discussion of a typical issue labels Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build/test discussion Discussion of a typical issue
Projects
None yet
Development

No branches or pull requests

2 participants