Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i start distributed parallel enviroment in the process of training? #72

Open
Daviddeer2 opened this issue Feb 22, 2023 · 1 comment

Comments

@Daviddeer2
Copy link

Hi there, the readme says that distributed parallel sampling can be implemented. But it doesn't look like this feature is presented in examples, for example the td3_script.py.
In issure #24 , you said "You can start the Server Manager once and then call env.make() multiple times , with the algorithm that we are using right now we have multiple workers running in parallel and each worker is calling env.make() and the Server Manager spawns a new instance of the env. ". Does you mean that there are only certain specific algorithms which contains multiple workers like D4PG could pararrel sample? It's confused that in openai gym, parallel envs could be accomplished by VecEnv based on multithreading of python, so that any RL algorithm could make it.
What can i have to do to start a parallel envs with the help of robo-gym? Is there any examples or documents to reference?
I would really appreciate it if someone can help me out. Thanks in advance.

@jr-b-reiterer
Copy link

Hey @Daviddeer2, sorry for the long delay.

D4PG is one option that we have been using internally.

With stable-baselines3 it is also possible to simply wrap many robo-gym envs in a SubprocVecEnv, like in the following snippet - please just take it as a cheap example, not a recommendation.

Notes:

  • You could of course use different IPs for the environments, corresponding to server managers on different machines. With the single IP you would get multiple robot servers in parallel created by the same server manager.
  • Considering that parallel steps are synchronized, it is not impossible (though maybe insignificant) that bottlenecks could deteriorate the results by reducing the rate of steps while the simulations keep running at their own pace. Handling the individual environments in separate workers can be more precise and efficient. It also allows you to react to termination or truncation individually, or to collect step results for asynchronous bulk processing.
from multiprocessing import freeze_support
import gym
import robo_gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv

if __name__ == '__main__':
    freeze_support()

    num_envs = 2
    env_ids = ['NoObstacleNavigationMir100Sim-v0'] * num_envs
    target_machine_ip = "127.0.0.1"
    envs = SubprocVecEnv([lambda:gym.make(env_id, ip=target_machine_ip, gui=True) for env_id in env_ids])

    model = PPO('MlpPolicy', envs, verbose=1)
    model.learn(total_timesteps=1000)
    model.save("PPO_mir_from_parallel")

    envs.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants