[WIP] new feature: runner to launch over multiple virtual environments #377

TimotheeMathieu · 2023-10-22T12:33:10Z

In this PR I add code meant to simplify running experiments over multiple virtual environment. The main tool for this is nox that I wrap using decorators. You just need to wrap a function with @with_venv decorator to execute it in a separate virtual environment. This could become an alternative to rlberry.experiments with some work. This is also nice for deployment purpose because all the libraries are automatically installed, you just need to run the script and have python and rlberry installed all the other libraries are automatically installed as needed.

Example of code:

from rlberry.manager import with_venv, run_xp

@with_venv(import_libs=["numpy", "mushroom_rl"])
def run_mushroom():
    """
    Simple script to solve a simple chain with Q-Learning.

    """
    import numpy as np

    from mushroom_rl.algorithms.value import QLearning
    from mushroom_rl.core import Core, Logger
    from mushroom_rl.environments import generate_simple_chain
    from mushroom_rl.policy import EpsGreedy
    from mushroom_rl.utils.parameters import Parameter

    from mushroom_rl.utils.dataset import compute_J

    np.random.seed()

    logger = Logger(QLearning.__name__, results_dir=None)
    logger.strong_line()
    logger.info('Experiment Algorithm: ' + QLearning.__name__)

    # MDP
    mdp = generate_simple_chain(state_n=5, goal_states=[2], prob=.8, rew=1,
                                gamma=.9)

    # Policy
    epsilon = Parameter(value=.15)
    pi = EpsGreedy(epsilon=epsilon)

    # Agent
    learning_rate = Parameter(value=.2)
    algorithm_params = dict(learning_rate=learning_rate)
    agent = QLearning(mdp.info, pi, **algorithm_params)

    # Core
    core = Core(agent, mdp)

    # Initial policy Evaluation
    dataset = core.evaluate(n_steps=1000)
    J = np.mean(compute_J(dataset, mdp.info.gamma))
    logger.info(f'J start: {J}')

    # Train
    core.learn(n_steps=10000, n_steps_per_fit=1)

    # Final Policy Evaluation
    dataset = core.evaluate(n_steps=1000)
    J = np.mean(compute_J(dataset, mdp.info.gamma))
    logger.info(f'J final: {J}')


@with_venv(import_libs=["stable-baselines3"], python_ver="3.9")
def run_sb():
    import gymnasium as gym

    from stable_baselines3 import A2C

    env = gym.make("CartPole-v1")

    model = A2C("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=1_500)

    vec_env = model.get_env()
    obs = vec_env.reset()
    cum_reward = 0
    for i in range(1000):
        action, _state = model.predict(obs, deterministic=True)
        obs, reward, done, info = vec_env.step(action)
        cum_reward += reward
    print(cum_reward)
        
if __name__ == "__main__":
    run_xp()

The first time this is run, virtual environments are created in the directory in which the script is, subsequent call reuse the virtual environments. The initial environment I am in only needs to have rlberry installed, but not stablebaselines3 or mushroom-rl. For now, I use it to run code from these libraries without trying to interface further with rlberry. The virtual environments can contain different libraries and can be run using different python executables (as in the example).

The result is then

OpenGL.platform.ctypesloader > Loaded libGL.so => libGL.so.1 <CDLL 'libGL.so.1', handle 55f02bae78c0 at 0x7f6343005610>
OpenGL.acceleratesupport > No OpenGL_accelerate module loaded: No module named 'OpenGL_accelerate'
OpenGL.platform.ctypesloader > Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 55f02bafacb0 at 0x7f6340fa9f10>
numexpr.utils > Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
numexpr.utils > NumExpr defaulting to 8 threads.
nox > Running session run_mushroom
nox > Re-using existing virtual environment at $HOME/rlberry/examples/rlberry_venvs/run_mushroom.
nox > python -m pip install numpy
nox > python -m pip install mushroom_rl
nox > python /tmp/tmp2f2yhau5/run_mushroom.py
22/10/2023 14:23:01 [INFO] ###################################################################################################
22/10/2023 14:23:01 [INFO] Experiment Algorithm: QLearning
22/10/2023 14:23:01 [INFO] J start: 1.4276799047757556                                                    
22/10/2023 14:23:02 [INFO] J final: 3.044715108158618                                                     
nox > Session run_mushroom was successful.
nox > Running session run_sb
nox > Re-using existing virtual environment at $HOME/rlberry/examples/rlberry_venvs/run_sb.
nox > python -m pip install stable-baselines3
nox > python /tmp/tmp2f2yhau5/run_sb.py
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 24.3     |
|    ep_rew_mean        | 24.3     |
| time/                 |          |
|    fps                | 372      |
|    iterations         | 100      |
|    time_elapsed       | 1        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -0.667   |
|    explained_variance | -0.114   |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | 1.5      |
|    value_loss         | 7.06     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 33.3     |
|    ep_rew_mean        | 33.3     |
| time/                 |          |
|    fps                | 465      |
|    iterations         | 200      |
|    time_elapsed       | 2        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -0.631   |
|    explained_variance | 0.0161   |
|    learning_rate      | 0.0007   |
|    n_updates          | 199      |
|    policy_loss        | 1.41     |
|    value_loss         | 8.66     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 40.3     |
|    ep_rew_mean        | 40.3     |
| time/                 |          |
|    fps                | 507      |
|    iterations         | 300      |
|    time_elapsed       | 2        |
|    total_timesteps    | 1500     |
| train/                |          |
|    entropy_loss       | -0.611   |
|    explained_variance | -0.0168  |
|    learning_rate      | 0.0007   |
|    n_updates          | 299      |
|    policy_loss        | -0.437   |
|    value_loss         | 101      |
------------------------------------
[100.]
nox > Session run_sb was successful.
nox > Ran multiple sessions:
nox > * run_mushroom: success
nox > * run_sb: success

@mmcenta : this is what I had in mind while saying that rlberry could handle virtual environments.

This is a proof of concept, it works, but this is very preliminary work.

* fix bug dill and compress always * change version

TimotheeMathieu · 2023-11-16T16:49:08Z

Now included: a decorator for running things in a guix container, (guix is like a very powerful package manager).
This has the advantage of dumping a channel file (think commit for guix) that can be reused later to reconstruct the container for an almost perfect reproducibility (in particular, guix will take care of keeping the same C-library like cuda and other torch backends).

Example:

from rlberry.manager import with_guix, run_guix_xp

@with_guix(import_libs=["stable-baselines3"])
def run_sb():
    import gymnasium as gym

    from stable_baselines3 import A2C

    env = gym.make("CartPole-v1")

    model = A2C("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=1_500)

    vec_env = model.get_env()
    obs = vec_env.reset()
    cum_reward = 0
    for i in range(1000):
        action, _state = model.predict(obs, deterministic=True)
        obs, reward, done, info = vec_env.step(action)
        cum_reward += reward
    print(cum_reward)


if __name__ == "__main__":
    run_guix_xp(keep_build_dir=True)

…tualenv_runners

TimotheeMathieu · 2024-06-05T15:20:28Z

For now, let us take it one at a time, I removed everything guix and this is only for venv. I will do guix in a separate PR.

github-actions · 2024-06-06T08:27:48Z

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

github-actions · 2024-06-06T08:36:43Z

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

github-actions · 2024-06-06T08:48:58Z

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

github-actions · 2024-06-06T09:52:58Z

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

github-actions · 2024-06-07T12:10:02Z

Hello,
The build of the doc failed. Look up the reason here:
https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml

github-actions · 2024-06-07T12:46:25Z

Hello,
The build of the doc succeeded. The documentation preview is available here:
https://rlberry-py.github.io/rlberry/preview_pr

JulienT01 · 2024-06-12T11:33:55Z

rlberry/manager/env_tools.py

+    return filename
+
+
+def with_venv(import_libs=None, requirements=None, python_ver=None, verbose=False):


Can you clarify how 'import_libs' and 'requirements' work in the doc (only one to be used, or which takes priority over the other)?
In addition, maybe add an 'if' to force only one of the parameters to be used.

JulienT01 · 2024-06-12T11:46:47Z

examples/example_venv.py

+for RL experimentation with several separated environments.
+
+"""
+


Maybe explain how 'run_venv_xp' is linked to 'run_sb' and 'run_mushroom'.

explain : scripts are generated at compile time (through annotation), but functions 'run_sb' / 'run_mushroom' are not directly called.

explain more 'run_venv_xp': run scripts in a folder where functions have added 'run_sb' / 'run_mushroom' content. (perhaps add the parameter for folder_name in the example to make it more readable?)

TimotheeMathieu added 2 commits October 22, 2023 11:07

V0.4.0 (rlberry-py#275)

355d427

* fix bug dill and compress always * change version

initial venv tool

3fb76b6

TimotheeMathieu requested a review from mmcenta October 22, 2023 12:33

TimotheeMathieu added 3 commits November 15, 2023 09:38

working pip

afde210

guix env tools

d44570b

change file names for temp scripts

b70a2cf

TimotheeMathieu changed the title ~~[WIP] new feature: runner over multiple virtual environments~~ [WIP] new feature: runner to launch over multiple virtual environments or multiple guix containers Nov 27, 2023

TimotheeMathieu added 5 commits June 5, 2024 16:14

Merge branch 'main' of https://github.com/rlberry-py/rlberry into vir…

883cbe6

…tualenv_runners

reuse, handle requirements

9c08f17

remove guix stuff for now

00f24a0

update init

ab5df2e

fix reuse

fe5d0ee

fix readme

3a083fa

TimotheeMathieu changed the title ~~[WIP] new feature: runner to launch over multiple virtual environments or multiple guix containers~~ [WIP] new feature: runner to launch over multiple virtual environments Jun 6, 2024

TimotheeMathieu added 2 commits June 6, 2024 10:22

add a test

7ded54b

faster test

8ec7cc6

TimotheeMathieu added the ready for review label Jun 6, 2024

doc and names

18eb48b

add nox in extra deps

27419cd

add path to enable importing from local files

34e5c0b

update example and add to api

f9feaac

typo

f9feb1c

TimotheeMathieu requested a review from JulienT01 June 10, 2024 12:11

JulienT01 reviewed Jun 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] new feature: runner to launch over multiple virtual environments #377

[WIP] new feature: runner to launch over multiple virtual environments #377

TimotheeMathieu commented Oct 22, 2023 •

edited

TimotheeMathieu commented Nov 16, 2023 •

edited

TimotheeMathieu commented Jun 5, 2024 •

edited

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 7, 2024

github-actions bot commented Jun 7, 2024

JulienT01 Jun 12, 2024

JulienT01 Jun 12, 2024

		return filename


		def with_venv(import_libs=None, requirements=None, python_ver=None, verbose=False):

		for RL experimentation with several separated environments.

		"""

[WIP] new feature: runner to launch over multiple virtual environments #377

Are you sure you want to change the base?

[WIP] new feature: runner to launch over multiple virtual environments #377

Conversation

TimotheeMathieu commented Oct 22, 2023 • edited

TimotheeMathieu commented Nov 16, 2023 • edited

TimotheeMathieu commented Jun 5, 2024 • edited

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 7, 2024

github-actions bot commented Jun 7, 2024

JulienT01 Jun 12, 2024

Choose a reason for hiding this comment

JulienT01 Jun 12, 2024

Choose a reason for hiding this comment

TimotheeMathieu commented Oct 22, 2023 •

edited

TimotheeMathieu commented Nov 16, 2023 •

edited

TimotheeMathieu commented Jun 5, 2024 •

edited