Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) #208

Merged
merged 14 commits into from
May 29, 2024

Conversation

amacati
Copy link
Contributor

@amacati amacati commented Feb 10, 2024

Description

The gymnasium API allows users to seed the environment on each reset to yield reproducible results. Running the environment with the same seed should always give the exact same results. While the documentation recommends that users should seed reset only once, it does not forbid seeding multiple times.

FetchPickAndPlace-v2 does not yield reproducible results under these conditions. The reset observation is identical, but the observations start deviating at the first environment step using identical actions.

The inconsistencies arise because the internal Mujoco state is not restored properly when _reset_sim() is called. Specifically, the position and quaternions of the mocap bodies are currently not being reset. Furthermore, the Mujoco integrator uses warmstarts and caches the last controls in mjData. In the current implementation, these are also not reset. Only if these four mjData fields are properly restored to their initial states, env.reset(seed=seed) yields reproducible results.

Fixes #207

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@Kallinteris-Andreas
Copy link
Collaborator

Can you try replacing the entire body of def _reset_sim(self): with a single mj_resetData

        mujoco.mj_resetData(self.model, self.data)

- Fix remaining mujoco envs
- Fix mujoco_py envs
- Simplify reset
@amacati
Copy link
Contributor Author

amacati commented Feb 11, 2024

Yes, that's a more principled solution. I had to add it to the Fetch environments separately because they redefine RobotEnv._reset_sim(). Also, I added the mujoco_py equivalent to the deprecated environments. This passes all tests for me now, both for mujoco environments and legacy mujoco_py envs.

@amacati
Copy link
Contributor Author

amacati commented Feb 12, 2024

I have resolved your comments and removed the modifications to the mujoco-py environments. Since they are expected to fail the new tests I have limited those to the mujoco environments and documented the reason in the test description.

Copy link
Collaborator

@Kallinteris-Andreas Kallinteris-Andreas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will require an environment version bump since

i will work on the gymnasium end to improve the check_env (i will be a bit busy with the 1.0 release)

self.data.act[:] = None

# Reset buffers for joint states, warm-start, control buffers etc.
mujoco.mj_resetData(self.model, self.data)
mujoco.mj_forward(self.model, self.data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mj_forward should also be removed (but kept for fetch_env because it changes the qpos)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests still succeed even if it is removed for fetch_env, although Mujoco should not reflect the changes in qpos in the positions of the links. Do you think it's worth adding tests that catch this or is that overkill?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure that removing mj_forward after moving the position of an object (qpos), will not result in bugs, so it is better to just keep it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it in. Let me know if that addresses all your remarks for the PR

@Kallinteris-Andreas
Copy link
Collaborator

  1. your test_same_env_determinism_rollout is effectively a reimplementation of gymnasium.utils.env_match.check_environments_match, I do not believe it has any additional utility, if you agree remove it.

You can use this test instead, to validate the change, which is better because it actually asserts that reset properly resets the environment state (note: check_mujoco_reset_state will be importable with from gymnasium.envs.mujoco.utils import check_mujoco_reset_state)

import mujoco
import numpy as np

import gymnasium

def get_state(
    env: gymnasium.envs.mujoco.MujocoEnv,
    state_type: mujoco.mjtState = mujoco.mjtState.mjSTATE_PHYSICS,
):
    """Gets the state of `env`.
    Arguments:
        env: Environment whose state to copy, `env.model` & `env.data` must be accessible.
        state_type: see the [documentation of mjtState](https://mujoco.readthedocs.io/en/stable/APIreference/APItypes.html#mjtstate) most users can use the default for training purposes or `mujoco.mjtState.mjSTATE_INTEGRATION` for validation purposes.
    """
    assert mujoco.__version__ >= "2.3.6", "Feature requires `mujuco>=2.3.6`"

    state = np.empty(mujoco.mj_stateSize(env.unwrapped.model, state_type))
    mujoco.mj_getState(env.unwrapped.model, env.unwrapped.data, state, state_type)
    return state


def check_mujoco_reset_state(env: gymnasium.envs.mujoco.MujocoEnv, seed=1234):
    """Asserts that `env.reset` properly resets the state (not affected by previous steps), assuming `check_reset_seed` has passed."""
    env.action_space.seed(seed)
    action = env.action_space.sample()

    env.reset(seed=seed)
    first_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)
    env.step(action)

    env.reset(seed=seed)
    second_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)

    assert np.all(first_reset_state == second_reset_state), "reset is not deterministic"
  1. This will require version bumping for all the affected environments (all based on robot_env), this take 2 steps
    2.1 update the changelog of each of those environments
    2.2 update the version number in https://github.com/Farama-Foundation/Gymnasium-Robotics/blob/main/gymnasium_robotics/__init__.py
  2. I will likely not be merging this before gymnasium==1.0.0a2

@amacati
Copy link
Contributor Author

amacati commented Feb 21, 2024

I don't think that's correct. The test you mentioned steps through two environments and checks if the information from two distinct environments match at each step. This cannot detect the failure case of this PR since the deviation only occurs after an environment has been reset and stepped through for several times. If we wanted to catch the failure case with the same function, we'd have to run a rollout of one of the two environments before starting the check_environments_match test.

The current test runs a rollout, resets the same env object with the same seed, runs a second rollout, and then tests if the outputs of both rollouts match each other. The key difference is that it uses a single environment that steps multiple times before being reset and thus shows if effects like warming up etc are correctly accounted for.

I do think however that the combination of your check_mujoco_reset_state() test function and the gymnasium check_environments_match test cover it as well.

@amacati
Copy link
Contributor Author

amacati commented Feb 26, 2024

When bumping the environment version number, do we replace all the v2s, should we issue a deprecation error or is that done automatically? And do the required changes to the documentation also go into this PR?

@Kallinteris-Andreas
Copy link
Collaborator

  1. yes bump all changed environments
  2. deprecation warning are handled by gymnasium.make
  3. yes, documentation should be changed in this PR

@amacati
Copy link
Contributor Author

amacati commented Feb 27, 2024

Hopefully the last question: Some envs are inheriting from MujocoRobotEnv, but the behavior seems to be unchanged. E.g. HandManipulateBlockFull-v1 passes the new tests also without any fixes. The environments with changed behavior are: FetchSlide-v2, FetchPickAndPlace-v2, FetchReach-v2, FetchPush-v2, FetchSlideDense-v2, FetchPickAndPlaceDense-v2, FetchReachDense-v2, FetchPushDense-v2, HandReach-v1, HandReachDense-v1. Therefore, all other variants of the hand tasks should be unchanged even with the new fixes, and should not be bumped. Is that okay with you?

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Feb 27, 2024

Since shadow envs are not be affected by the change we do not have to worry about them

we should verify that they remain unchanged though

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Mar 12, 2024

@rodrigodelazcano
What is the policy for keeping old verisons of environments?
In this case the change is practically small. And should not cause problems with comparing training performance, reusing policies and offline reinforcement learning with the data set generated from a previous environment version.

Update: I just checked. There are no datasets Minari for these environments).

@amacati
Copy link
Contributor Author

amacati commented May 5, 2024

@Kallinteris-Andreas Any news on when this will be merged?

@Kallinteris-Andreas
Copy link
Collaborator

@amacati waiting for gymnasium==1.0.0a2 which should be out soon(ish)

@Kallinteris-Andreas Kallinteris-Andreas changed the title Fix reproducibility issues in Fetch environments Fetch-v3 environments (Fix reproducibility issues) May 29, 2024
@Kallinteris-Andreas Kallinteris-Andreas changed the title Fetch-v3 environments (Fix reproducibility issues) Fetch environments new version (Fix reproducibility issues) May 29, 2024
@Kallinteris-Andreas Kallinteris-Andreas changed the title Fetch environments new version (Fix reproducibility issues) add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) May 29, 2024
@Kallinteris-Andreas Kallinteris-Andreas changed the title add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) May 29, 2024
@Kallinteris-Andreas
Copy link
Collaborator

Note: I changed the changelog in the documentation to be shorter

@Kallinteris-Andreas Kallinteris-Andreas merged commit e525caa into Farama-Foundation:main May 29, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug Report] FetchPickAndPlace-v2 does not yield reproducible results
2 participants