-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Upgrade to gymnasium 1.0.0a2 and ale_py 0.9.0. #45328
base: master
Are you sure you want to change the base?
[RLlib] Upgrade to gymnasium 1.0.0a2 and ale_py 0.9.0. #45328
Conversation
@@ -3,7 +3,6 @@ | |||
# Environment adapters. | |||
# --------------------- | |||
# Atari | |||
gymnasium==0.28.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since gymnasium is already part of the main Ray requirements.txt
file, we won't need this here anymore.
cc: @pseudo-rnd-thoughts @jkterry1 |
…ade_gymnasium_to_1_0_0a1
Signed-off-by: Sven Mika <sven@anyscale.io>
@@ -249,6 +249,8 @@ def _sample_timesteps( | |||
observation=obs[env_index], | |||
infos=infos[env_index], | |||
) | |||
self._was_terminated = [False for _ in range(self.num_envs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely new auto-reset logic of gymnasium 1.0. The sub-env only gets reset'd upon the next(!) step call (with a fake reward of 0.0 and term/trunc=guaranteed False; and the obs/infos being the reset-obs/infos).
This is actually good for us as we should always do the env-to-module connector pass (even after the last timestep with the terminal obs in the Episodes list) to make sure the user - in case they are writing to the episode - gets a chance to also alter the final obs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -88,7 +88,7 @@ def __init__(self, config: AlgorithmConfig, **kwargs): | |||
# actually hold the spaces for a single env, but for boxes the | |||
# shape is (1, 1) which brings a problem with the action dists. | |||
# shape=(1,) is expected. | |||
module_spec.action_space = self.env.envs[0].action_space | |||
module_spec.action_space = self.env.single_action_space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet. This is now gone.
eps += 1 | ||
|
||
episodes[env_index].add_env_step( | ||
infos[env_index].pop("final_observation"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, i.e. with gymnasium>=1.0.0
the final_observation
is gone and instead a regular observartion will be returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the final observation is returned in the actual obs
. The reset obs, you only get on the next(!) call to step, together with a dummy reward of 0.0.
…to upgrade_gymnasium_to_1_0_0a1 # Conflicts: # rllib/env/single_agent_env_runner.py
…ade_gymnasium_to_1_0_0a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, we have just released Gymnasium alpha 2, would you be able to test with gymnasium>=1.0.0a1
? This would help check compatibility
…ade_gymnasium_to_1_0_0a1 Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/env/single_agent_env_runner.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Hey @pseudo-rnd-thoughts , yes, we are in the process of wrapping this up. Thanks so much! Now that Atari is supported, I don't see any issues anymore holding us back to support 1.0.0a2 in RLlib's new stack. We'll let you know, if we still find any issues with the API. Very exciting! :) |
Amazing, that is very exciting to hear |
Upgrade RLlib to gymnasium 1.0.0a2.
Reason:
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.