Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Upgrade to gymnasium 1.0.0a2 and ale_py 0.9.0. #45328

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented May 14, 2024

Upgrade RLlib to gymnasium 1.0.0a2.

Reason:

  • We require some bug fixes in gymnasium that only exist in 1.0.0a1/2 (not in 0.29.1) that allow us to make use of their vectorized sync and async environments in RLlib's new EnvRunners.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
@@ -3,7 +3,6 @@
# Environment adapters.
# ---------------------
# Atari
gymnasium==0.28.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since gymnasium is already part of the main Ray requirements.txt file, we won't need this here anymore.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977
Copy link
Contributor Author

sven1977 commented May 14, 2024

cc: @pseudo-rnd-thoughts @jkterry1
Congrats on gymnasium 1.0!! This is super exciting. :)

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: Sven Mika <sven@anyscale.io>
@@ -249,6 +249,8 @@ def _sample_timesteps(
observation=obs[env_index],
infos=infos[env_index],
)
self._was_terminated = [False for _ in range(self.num_envs)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely new auto-reset logic of gymnasium 1.0. The sub-env only gets reset'd upon the next(!) step call (with a fake reward of 0.0 and term/trunc=guaranteed False; and the obs/infos being the reset-obs/infos).
This is actually good for us as we should always do the env-to-module connector pass (even after the last timestep with the terminal obs in the Episodes list) to make sure the user - in case they are writing to the episode - gets a chance to also alter the final obs.

Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -88,7 +88,7 @@ def __init__(self, config: AlgorithmConfig, **kwargs):
# actually hold the spaces for a single env, but for boxes the
# shape is (1, 1) which brings a problem with the action dists.
# shape=(1,) is expected.
module_spec.action_space = self.env.envs[0].action_space
module_spec.action_space = self.env.single_action_space
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet. This is now gone.

eps += 1

episodes[env_index].add_env_step(
infos[env_index].pop("final_observation"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, i.e. with gymnasium>=1.0.0 the final_observation is gone and instead a regular observartion will be returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the final observation is returned in the actual obs. The reset obs, you only get on the next(!) call to step, together with a dummy reward of 0.0.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
…to upgrade_gymnasium_to_1_0_0a1

# Conflicts:
#	rllib/env/single_agent_env_runner.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, we have just released Gymnasium alpha 2, would you be able to test with gymnasium>=1.0.0a1? This would help check compatibility

…ade_gymnasium_to_1_0_0a1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/env/single_agent_env_runner.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 changed the title [RLlib] Upgrade to gymnasium 1.0.0a1. [RLlib] Upgrade to gymnasium 1.0.0a2 and ale_py 0.9.0. May 31, 2024
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977
Copy link
Contributor Author

Hey @pseudo-rnd-thoughts , yes, we are in the process of wrapping this up. Thanks so much! Now that Atari is supported, I don't see any issues anymore holding us back to support 1.0.0a2 in RLlib's new stack. We'll let you know, if we still find any issues with the API. Very exciting! :)

@pseudo-rnd-thoughts
Copy link

Hey @pseudo-rnd-thoughts , yes, we are in the process of wrapping this up. Thanks so much! Now that Atari is supported, I don't see any issues anymore holding us back to support 1.0.0a2 in RLlib's new stack. We'll let you know, if we still find any issues with the API. Very exciting! :)

Amazing, that is very exciting to hear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants