Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRPO "underflow encountered in multiply" #59

Open
jarlva opened this issue Dec 20, 2019 · 2 comments
Open

TRPO "underflow encountered in multiply" #59

jarlva opened this issue Dec 20, 2019 · 2 comments
Labels
custom gym env Issue related to Custom Gym Env

Comments

@jarlva
Copy link

jarlva commented Dec 20, 2019

While running a TRPO train, after some time (random - anywhere from 15sec to 1min) it kicks with the following:
Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply

Using the recent version, 2.9.0, Python 3.7.5.

@araffin araffin added the more information needed Please fill the issue template label Dec 20, 2019
@araffin
Copy link
Owner

araffin commented Dec 20, 2019

Hello,
Please fill the issue template completely.

@jarlva
Copy link
Author

jarlva commented Dec 21, 2019

Training a custom Gym env with TRPO. After some time (random - anywhere from 30sec to 3 min) it kicks with the following traceback.
The error occurs only with TRPO. Using same code/environment/gym with another RL strategy completes successfully.
Tried the code below on CartPole-v1. Yet it does not cause an error (maybe because it's an easy one).

Traceback (most recent call last): File "callback.py", line 196, in <module> model.learn(total_timesteps=time_steps, callback=callback, tb_log_name=tb_sub_dir) File "/root/stable-baselines/stable_baselines/trpo_mpi/trpo_mpi.py", line 427, in learn self.vfadam.update(grad, self.vf_stepsize) File "/root/stable-baselines/stable_baselines/common/mpi_adam.py", line 61, in update step = (- step_size) * self.exp_avg / (np.sqrt(self.exp_avg_sq) + self.epsilon) FloatingPointError: underflow encountered in multiply

In the begining it seems the code starts fine. Yet, at some points it goes into "silent loop", without any updates on the console, as if as it's frozen. The only way to reveal and force it to spit the error is by adding to the top of stable-baselines\stable_baselines\trpo_mpi\utils.py,
after the line "import numpy as np", the following: np.seterr(all='raise')

Code example
from stable_baselines import TRPO #DQN, PPO2, A2C, ACKTR,
import tensorflow.compat.v1.logging as tflogging ; tflogging.set_verbosity(tflogging.ERROR) # supress tf warnings

import gym,
import numpy as np
np.seterr(all='raise')

env = gym.make('Myrl-v0')
model = TRPO('MlpPolicy', env, verbose=0)
model.learn(total_timesteps=900000)

System Info
Using the recent version, 2.9.0, Python 3.7.5.
Windows 10
TF 1.15
no GPU
installed via git and then "pip install -e ."

@araffin araffin added custom gym env Issue related to Custom Gym Env and removed more information needed Please fill the issue template labels Dec 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env
Projects
None yet
Development

No branches or pull requests

2 participants