Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] new feature: runner to launch over multiple virtual environments #377

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

TimotheeMathieu
Copy link
Collaborator

@TimotheeMathieu TimotheeMathieu commented Oct 22, 2023

In this PR I add code meant to simplify running experiments over multiple virtual environment. The main tool for this is nox that I wrap using decorators. You just need to wrap a function with @with_venv decorator to execute it in a separate virtual environment. This could become an alternative to rlberry.experiments with some work. This is also nice for deployment purpose because all the libraries are automatically installed, you just need to run the script and have python and rlberry installed all the other libraries are automatically installed as needed.

Example of code:

from rlberry.manager import with_venv, run_xp

@with_venv(import_libs=["numpy", "mushroom_rl"])
def run_mushroom():
    """
    Simple script to solve a simple chain with Q-Learning.

    """
    import numpy as np

    from mushroom_rl.algorithms.value import QLearning
    from mushroom_rl.core import Core, Logger
    from mushroom_rl.environments import generate_simple_chain
    from mushroom_rl.policy import EpsGreedy
    from mushroom_rl.utils.parameters import Parameter

    from mushroom_rl.utils.dataset import compute_J

    np.random.seed()

    logger = Logger(QLearning.__name__, results_dir=None)
    logger.strong_line()
    logger.info('Experiment Algorithm: ' + QLearning.__name__)

    # MDP
    mdp = generate_simple_chain(state_n=5, goal_states=[2], prob=.8, rew=1,
                                gamma=.9)

    # Policy
    epsilon = Parameter(value=.15)
    pi = EpsGreedy(epsilon=epsilon)

    # Agent
    learning_rate = Parameter(value=.2)
    algorithm_params = dict(learning_rate=learning_rate)
    agent = QLearning(mdp.info, pi, **algorithm_params)

    # Core
    core = Core(agent, mdp)

    # Initial policy Evaluation
    dataset = core.evaluate(n_steps=1000)
    J = np.mean(compute_J(dataset, mdp.info.gamma))
    logger.info(f'J start: {J}')

    # Train
    core.learn(n_steps=10000, n_steps_per_fit=1)

    # Final Policy Evaluation
    dataset = core.evaluate(n_steps=1000)
    J = np.mean(compute_J(dataset, mdp.info.gamma))
    logger.info(f'J final: {J}')


@with_venv(import_libs=["stable-baselines3"], python_ver="3.9")
def run_sb():
    import gymnasium as gym

    from stable_baselines3 import A2C

    env = gym.make("CartPole-v1")

    model = A2C("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=1_500)

    vec_env = model.get_env()
    obs = vec_env.reset()
    cum_reward = 0
    for i in range(1000):
        action, _state = model.predict(obs, deterministic=True)
        obs, reward, done, info = vec_env.step(action)
        cum_reward += reward
    print(cum_reward)
        
if __name__ == "__main__":
    run_xp()

The first time this is run, virtual environments are created in the directory in which the script is, subsequent call reuse the virtual environments. The initial environment I am in only needs to have rlberry installed, but not stablebaselines3 or mushroom-rl. For now, I use it to run code from these libraries without trying to interface further with rlberry. The virtual environments can contain different libraries and can be run using different python executables (as in the example).

The result is then

OpenGL.platform.ctypesloader > Loaded libGL.so => libGL.so.1 <CDLL 'libGL.so.1', handle 55f02bae78c0 at 0x7f6343005610>
OpenGL.acceleratesupport > No OpenGL_accelerate module loaded: No module named 'OpenGL_accelerate'
OpenGL.platform.ctypesloader > Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 55f02bafacb0 at 0x7f6340fa9f10>
numexpr.utils > Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
numexpr.utils > NumExpr defaulting to 8 threads.
nox > Running session run_mushroom
nox > Re-using existing virtual environment at $HOME/rlberry/examples/rlberry_venvs/run_mushroom.
nox > python -m pip install numpy
nox > python -m pip install mushroom_rl
nox > python /tmp/tmp2f2yhau5/run_mushroom.py
22/10/2023 14:23:01 [INFO] ###################################################################################################
22/10/2023 14:23:01 [INFO] Experiment Algorithm: QLearning
22/10/2023 14:23:01 [INFO] J start: 1.4276799047757556                                                    
22/10/2023 14:23:02 [INFO] J final: 3.044715108158618                                                     
nox > Session run_mushroom was successful.
nox > Running session run_sb
nox > Re-using existing virtual environment at $HOME/rlberry/examples/rlberry_venvs/run_sb.
nox > python -m pip install stable-baselines3
nox > python /tmp/tmp2f2yhau5/run_sb.py
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 24.3     |
|    ep_rew_mean        | 24.3     |
| time/                 |          |
|    fps                | 372      |
|    iterations         | 100      |
|    time_elapsed       | 1        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -0.667   |
|    explained_variance | -0.114   |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | 1.5      |
|    value_loss         | 7.06     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 33.3     |
|    ep_rew_mean        | 33.3     |
| time/                 |          |
|    fps                | 465      |
|    iterations         | 200      |
|    time_elapsed       | 2        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -0.631   |
|    explained_variance | 0.0161   |
|    learning_rate      | 0.0007   |
|    n_updates          | 199      |
|    policy_loss        | 1.41     |
|    value_loss         | 8.66     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 40.3     |
|    ep_rew_mean        | 40.3     |
| time/                 |          |
|    fps                | 507      |
|    iterations         | 300      |
|    time_elapsed       | 2        |
|    total_timesteps    | 1500     |
| train/                |          |
|    entropy_loss       | -0.611   |
|    explained_variance | -0.0168  |
|    learning_rate      | 0.0007   |
|    n_updates          | 299      |
|    policy_loss        | -0.437   |
|    value_loss         | 101      |
------------------------------------
[100.]
nox > Session run_sb was successful.
nox > Ran multiple sessions:
nox > * run_mushroom: success
nox > * run_sb: success

@mmcenta : this is what I had in mind while saying that rlberry could handle virtual environments.

This is a proof of concept, it works, but this is very preliminary work.

* fix bug dill and compress always

* change version
@TimotheeMathieu
Copy link
Collaborator Author

TimotheeMathieu commented Nov 16, 2023

Now included: a decorator for running things in a guix container, (guix is like a very powerful package manager).
This has the advantage of dumping a channel file (think commit for guix) that can be reused later to reconstruct the container for an almost perfect reproducibility (in particular, guix will take care of keeping the same C-library like cuda and other torch backends).

Example:

from rlberry.manager import with_guix, run_guix_xp

@with_guix(import_libs=["stable-baselines3"])
def run_sb():
    import gymnasium as gym

    from stable_baselines3 import A2C

    env = gym.make("CartPole-v1")

    model = A2C("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=1_500)

    vec_env = model.get_env()
    obs = vec_env.reset()
    cum_reward = 0
    for i in range(1000):
        action, _state = model.predict(obs, deterministic=True)
        obs, reward, done, info = vec_env.step(action)
        cum_reward += reward
    print(cum_reward)


if __name__ == "__main__":
    run_guix_xp(keep_build_dir=True)

@TimotheeMathieu TimotheeMathieu changed the title [WIP] new feature: runner over multiple virtual environments [WIP] new feature: runner to launch over multiple virtual environments or multiple guix containers Nov 27, 2023
@TimotheeMathieu
Copy link
Collaborator Author

TimotheeMathieu commented Jun 5, 2024

For now, let us take it one at a time, I removed everything guix and this is only for venv. I will do guix in a separate PR.

@TimotheeMathieu TimotheeMathieu changed the title [WIP] new feature: runner to launch over multiple virtual environments or multiple guix containers [WIP] new feature: runner to launch over multiple virtual environments Jun 6, 2024
Copy link

github-actions bot commented Jun 6, 2024

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

Copy link

github-actions bot commented Jun 6, 2024

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

Copy link

github-actions bot commented Jun 6, 2024

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

Copy link

github-actions bot commented Jun 6, 2024

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

Copy link

github-actions bot commented Jun 7, 2024

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

Copy link

github-actions bot commented Jun 7, 2024

Hello,
The build of the doc succeeded. The documentation preview is available here:
https://rlberry-py.github.io/rlberry/preview_pr

return filename


def with_venv(import_libs=None, requirements=None, python_ver=None, verbose=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify how 'import_libs' and 'requirements' work in the doc (only one to be used, or which takes priority over the other)?
In addition, maybe add an 'if' to force only one of the parameters to be used.

for RL experimentation with several separated environments.

"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain how 'run_venv_xp' is linked to 'run_sb' and 'run_mushroom'.

  • explain : scripts are generated at compile time (through annotation), but functions 'run_sb' / 'run_mushroom' are not directly called.
  • explain more 'run_venv_xp': run scripts in a folder where functions have added 'run_sb' / 'run_mushroom' content. (perhaps add the parameter for folder_name in the example to make it more readable?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants