Adding Hyperparameter Optimisation (HPO) #978

bordeauxred · 2023-10-25T10:02:17Z

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, gymnasium as gym, torch, numpy, sys
print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

A common task when using deep rl is to tune hyperparameters. While a lucky hand or grid search are always possible, more structured approaches are desirable and computationally preferable.
The recent paper Hyperparameters in Reinforcement Learning and How To Tune Them proposes an evaluation protocol for (hpo for) deep rl.

Often the result of rl experiments depends greatly on the selected seeds, with a high variance between seeds. The paper proposes as evaluation procedure to define and report disjoint sets of training and evaluation seeds. Each run (of plain rl or hpo+rl) is performed on a set of training seeds and evaluated on the set of test seeds.

A possible implementation strategy is to use hydra for the configuration of the search spaces (on top of the high level interfaces #970). This allows the combination with a) optuna hydra sweepers as well as b) the hpo sweepers from the aforementioned paper. We will contact the authors to integrate the sweepers from their repo which contains sweepers for:

Differential Evolution Hyperband
Standard Population Based Training (with warmstarting option)
Population Based Bandits (with Mix/Multi versions and warmstarting option)
Bayesian-Generational Population Based Training

@MischaPanch

MischaPanch · 2023-10-25T10:40:12Z

@Trinkle23897 we plan to address it after the high-level interfaces from @opcode81 are merged. If you have any other proposals, would be happy to hear them!

Existing hpo approaches include:

stable-baselines zoo, which is based on pure optuna (not through hydra sweepers) and has a sophisticated module for experiments
NNI: @bordeauxred and I actually tried it and liked it, but it seems that the project is dead or at least stale. It's a shame... There are quite some bugs and documentation issues in the current version, and in case the development indeed came to a halt, it would be better not to rely on it.

Generally, from a quick look it seems like hydra sweepers are an attractive option, b/c they can be implemented on top of other hpo engines. For optuna there already is some support, and in case NNI is resurrected, it would probably be possible to make a new hydra sweeper based on it, if ever needed.

MischaPanch · 2024-01-08T13:46:06Z

We will do this in (at least) two stages. The first will be a proper test-evaluation protocol for a single params config. @bordeauxred is on it

MischaPanch added the algorithm enhancement Not quite a new algorithm, but an enhancement to algo. functionality label Oct 25, 2023

MischaPanch added this to To do in Overall Tianshou Status via automation Oct 25, 2023

MischaPanch added this to the Release 1.0.0 milestone Oct 25, 2023

MischaPanch assigned MischaPanch and bordeauxred Oct 25, 2023

MischaPanch added the major Large changes that cannot or should not be broken down into smaller ones label Oct 25, 2023

MischaPanch mentioned this issue Oct 26, 2023

High-Level API #970

Merged

9 tasks

MischaPanch pinned this issue Nov 10, 2023

MischaPanch mentioned this issue Nov 10, 2023

Roadmap towards 1.0.0 Release #929

Closed

MischaPanch moved this from To do to In progress in Overall Tianshou Status Jan 8, 2024

MischaPanch mentioned this issue Jan 8, 2024

Docs and examples on how to report performance issues #936

Open

MischaPanch mentioned this issue Jan 25, 2024

Clearer separation between the trainer and the algorithm and refactoring of policy classes #1034

Open

9 tasks

MischaPanch mentioned this issue Feb 14, 2024

[Feature Request] Integrated hyperparameter tuning system #439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Hyperparameter Optimisation (HPO) #978

Adding Hyperparameter Optimisation (HPO) #978

bordeauxred commented Oct 25, 2023 •

edited by MischaPanch

MischaPanch commented Oct 25, 2023

MischaPanch commented Jan 8, 2024

Adding Hyperparameter Optimisation (HPO) #978

Adding Hyperparameter Optimisation (HPO) #978

Comments

bordeauxred commented Oct 25, 2023 • edited by MischaPanch

MischaPanch commented Oct 25, 2023

MischaPanch commented Jan 8, 2024

bordeauxred commented Oct 25, 2023 •

edited by MischaPanch