Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearer separation between the trainer and the algorithm and refactoring of policy classes #1034

Open
3 of 9 tasks
maxhuettenrauch opened this issue Jan 24, 2024 · 1 comment
Labels
blocked Can't be worked on for now breaking changes Changes in public interfaces. Includes small changes or changes in keys major Large changes that cannot or should not be broken down into smaller ones refactoring No change to functionality tentative Up to discussion, may be dismissed
Milestone

Comments

@maxhuettenrauch
Copy link
Collaborator

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
    • design request (i.e. "X should be changed to Y.")
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gymnasium as gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

While Tianshou's idea of encapsulating everything learning related into a single method of a base policy class is a nice idea to keep algorithms more or less self contained, it has lead to a cluttered trainer and the need for kwargs during the update call to remain functional (see #949). Additionally, while tianshou uses RL jargon such as policies, they’re not really doing what, in RL terms, policies are doing, i.e., mapping from states to actions. Rather, tianshou interweaves the RL policy with parts of RL algorithms. Also, the trainer implements some parts which might be algorithm specific, like sampling transitions, and then passing them to tianshou-policies for updates.

In order to create a cleaner framework, I think the following changes would improve things:

  • Separate trainer from algorithm logic
    As already mentioned, the trainer currently holds the logic for training loops, logging, and additionally holds some algorithm logic. Half of the if statements and asserts in the __next__ function are necessary because the trainer doesn’t really know what type of RL it’s doing. Instead, I think it makes much more sense to move algorithm logic to new algorithm classes and let the trainer solely be responsible for running the algorithm, logging, and checkpointing. There would also be only one trainer, instead of the OnPolicyTrainer, etc. Eventually, the trainer may even serve as the factory for the high level API.

  • Separate algorithms from policies
    While this adds an additional layer of abstraction/complexity, things will eventually become cleaner. Right now, loads of keywords and arguments can be given to the trainer and they might not even be compatible. A BaseAlgorithm class would allow for more explicit arguments given to a certain algorithm and additionally allow for unconventional algorithms such as ARS. This would also allow for the algorithm to be independent of code regarding the type of policy (continuous/discrete) The policy class then is merely the function that maps from observations to actions.

@opcode81 opcode81 added refactoring No change to functionality tentative Up to discussion, may be dismissed breaking changes Changes in public interfaces. Includes small changes or changes in keys labels Jan 24, 2024
@MischaPanch MischaPanch added major Large changes that cannot or should not be broken down into smaller ones blocked Can't be worked on for now labels Jan 25, 2024
@MischaPanch MischaPanch added this to To do in Overall Tianshou Status via automation Jan 25, 2024
@MischaPanch MischaPanch added this to the Release 1.0.0 milestone Jan 25, 2024
@MischaPanch
Copy link
Collaborator

I'm pinning this issue, it will not happen in next release, but we will definitely look into it asap.

Right now it's blocked, as is any algorithm extension, until we have an evaluation protocol and reliable regression tests (including some kind of reporting to a publicly accessible page) - i.e. by issues in the are of #936 #978 and similar ones

@MischaPanch MischaPanch pinned this issue Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Can't be worked on for now breaking changes Changes in public interfaces. Includes small changes or changes in keys major Large changes that cannot or should not be broken down into smaller ones refactoring No change to functionality tentative Up to discussion, may be dismissed
Development

No branches or pull requests

3 participants