Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust locations of setting the policy in train/eval mode #1122

Open
maxhuettenrauch opened this issue Apr 24, 2024 · 1 comment
Open

Adjust locations of setting the policy in train/eval mode #1122

maxhuettenrauch opened this issue Apr 24, 2024 · 1 comment
Assignees
Labels
bug Something isn't working refactoring No change to functionality

Comments

@maxhuettenrauch
Copy link
Collaborator

Currently, tianshou sets the policy's mode in the trainer and test_episode function. The corresponding training attribute is then used to determine if a stochastic policy should be evaluated deterministically given that policy.deterministic_eval is True. This, however, is a misuse as the training attribute primarily has influence on modules like dropout and batchnorm. It should always be False during data collection and only be True inside policy.learn.

@opcode81
Copy link
Collaborator

opcode81 commented May 3, 2024

Max and I have implemented the following solution in #1123:

  • We Introduced a new flag is_within_training_step which is enabled by the training algorithm when within a training step, where a training step encompasses training data collection and policy updates. This flag is now used by algorithms to decide whether their deterministic_eval setting should indeed apply instead of the torch training flag (which was abused!).
  • The policy's training/eval mode (which should control torch-level learning only) no longer needs to be set in user code in order to control collector behaviour (this didn't make sense!). The respective calls have been removed.
  • The policy should, in fact, always be in evaluation mode when applying data collection, as there is no reason to ever have gradient accumulation enabled for any type of rollout. We thus specifically set the policy to evaluation mode in Collector.collect.

MischaPanch added a commit that referenced this issue May 6, 2024
Addresses #1122:
* We Introduced a new flag `is_within_training_step` which is enabled by
the training algorithm when within a training step, where a training
step encompasses training data collection and policy updates. This flag
is now used by algorithms to decide whether their `deterministic_eval`
setting should indeed apply instead of the torch training flag (which
was abused!).
* The policy's training/eval mode (which should control torch-level
learning only) no longer needs to be set in user code in order to
control collector behaviour (this didn't make sense!). The respective
calls have been removed.
* The policy should, in fact, always be in evaluation mode when applying
data collection, as there is no reason to ever have gradient
accumulation enabled for any type of rollout. We thus specifically set
the policy to evaluation mode in Collector.collect. Further, it never
makes sense to compute gradients during collection, so the possibility
to pass `no_grad=False` was removed.

Further changes:
- Base class for collectors: `BaseCollector`
- New util context managers `in_eval_mode` and `in_train_mode` for torch
modules.
- `reset` of `Collectors` now returns `obs` and `info`. 
- `no-grad` no longer accepted as kwarg of `collect`
- Removed deprecations of `0.5.1` (will likely not affect anyone) and
the unused `warnings` module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working refactoring No change to functionality
Projects
None yet
Development

No branches or pull requests

3 participants