You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, tianshou sets the policy's mode in the trainer and test_episode function. The corresponding training attribute is then used to determine if a stochastic policy should be evaluated deterministically given that policy.deterministic_eval is True. This, however, is a misuse as the training attribute primarily has influence on modules like dropout and batchnorm. It should always be False during data collection and only be True inside policy.learn.
The text was updated successfully, but these errors were encountered:
Max and I have implemented the following solution in #1123:
We Introduced a new flag is_within_training_step which is enabled by the training algorithm when within a training step, where a training step encompasses training data collection and policy updates. This flag is now used by algorithms to decide whether their deterministic_eval setting should indeed apply instead of the torch training flag (which was abused!).
The policy's training/eval mode (which should control torch-level learning only) no longer needs to be set in user code in order to control collector behaviour (this didn't make sense!). The respective calls have been removed.
The policy should, in fact, always be in evaluation mode when applying data collection, as there is no reason to ever have gradient accumulation enabled for any type of rollout. We thus specifically set the policy to evaluation mode in Collector.collect.
Addresses #1122:
* We Introduced a new flag `is_within_training_step` which is enabled by
the training algorithm when within a training step, where a training
step encompasses training data collection and policy updates. This flag
is now used by algorithms to decide whether their `deterministic_eval`
setting should indeed apply instead of the torch training flag (which
was abused!).
* The policy's training/eval mode (which should control torch-level
learning only) no longer needs to be set in user code in order to
control collector behaviour (this didn't make sense!). The respective
calls have been removed.
* The policy should, in fact, always be in evaluation mode when applying
data collection, as there is no reason to ever have gradient
accumulation enabled for any type of rollout. We thus specifically set
the policy to evaluation mode in Collector.collect. Further, it never
makes sense to compute gradients during collection, so the possibility
to pass `no_grad=False` was removed.
Further changes:
- Base class for collectors: `BaseCollector`
- New util context managers `in_eval_mode` and `in_train_mode` for torch
modules.
- `reset` of `Collectors` now returns `obs` and `info`.
- `no-grad` no longer accepted as kwarg of `collect`
- Removed deprecations of `0.5.1` (will likely not affect anyone) and
the unused `warnings` module.
Currently, tianshou sets the policy's mode in the trainer and
test_episode
function. The correspondingtraining
attribute is then used to determine if a stochastic policy should be evaluated deterministically given thatpolicy.deterministic_eval
isTrue
. This, however, is a misuse as thetraining
attribute primarily has influence on modules like dropout and batchnorm. It should always beFalse
during data collection and only beTrue
insidepolicy.learn
.The text was updated successfully, but these errors were encountered: