Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch: don't just set 0 when elements have None entries #1088

Open
MischaPanch opened this issue Apr 3, 2024 · 8 comments
Open

Batch: don't just set 0 when elements have None entries #1088

MischaPanch opened this issue Apr 3, 2024 · 8 comments
Assignees
Labels
Batch and Buffer Improvements in internal data structures, temporary label bug Something isn't working

Comments

@MischaPanch
Copy link
Collaborator

This is essentially a bug due to the current implementation of __setitem__:

image

Setting 0 when something is off is wrong. This can manifest itself in things like info of reset, where certain values might be unknown (like actions or env_num). Then None is erroneously replaced by 0. There are multiple tests covering this erroneous behavior, e.g.,
line 112 in
image

One should instead at least turn this into NaN entries

@MischaPanch MischaPanch added bug Something isn't working Batch and Buffer Improvements in internal data structures, temporary label labels Apr 3, 2024
@MischaPanch MischaPanch added this to To do in Overall Tianshou Status via automation Apr 3, 2024
@maxhuettenrauch
Copy link
Collaborator

As far as I can tell, we reach this point in two cases:

  1. Adding an item to the buffer where the new item has an inconsistent structure compared to the already existing structure and
  2. updating an existing item with an item that has an inconsistent structure compared to the already existing structure.

I think the main question here is if these operations should be allowed in the first place. In the RL context this probably only happens in case the info dict changes its structure between steps (as happens in MoveToRightEnv in the tests). One could argue, that the env is ill-defined in this case and instead of setting arbitrary default values, the env should be fixed?

@maxhuettenrauch
Copy link
Collaborator

Regarding your suggestion to set values to NaN instead, assigning np.nan will fail on arrays whose dtype is not float (as is the case in the above mentioned test).

@MischaPanch
Copy link
Collaborator Author

@maxhuettenrauch unfortunately, in the RL context this is bound to happen because some things might not be known at reset that are known at step, and thus will contain None entries. The prime example is the action, which is not available at reset, but some entrances from the info might also be missing

@MischaPanch
Copy link
Collaborator Author

This issue is strongly related to #1087

@maxhuettenrauch
Copy link
Collaborator

But when would you add obs directly after reset to the buffer (where a Batch object is created) and only after that retrieve an action, call step, and append the rest to this entry?

@MischaPanch
Copy link
Collaborator Author

I thought it might happen in the collectors, but maybe I'm mistaken.

For sure I've seen this happen with info objects somewhere in collector tests - though there it is a bad implementation of the env.

Unfortunately Gymnasium doesn't force any interface on the info dicts which are the main drivers of this problem.We can stop supporting such cases, or ask the use to specify what should happen then explicitly.

I am all for restricting the number of supported operations to decrease complexity and probability of errors :)

@maxhuettenrauch
Copy link
Collaborator

Yes, definitely happening in the collector tests, due to said bad env design. I'm gonna check some examples with standard mujoco envs.

@maxhuettenrauch
Copy link
Collaborator

Related: Farama-Foundation/Gymnasium#540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Batch and Buffer Improvements in internal data structures, temporary label bug Something isn't working
Development

No branches or pull requests

2 participants