Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem retraining PPO1 model and using Tensorflow with Stable Baselines 2 #1154

Open
durantagre opened this issue Mar 12, 2022 · 1 comment
Labels
question Further information is requested

Comments

@durantagre
Copy link

Dear altruists, I am new at stable baselines and RL. I am trying to retrain my previously trained PPO1 model as like it will start learning from where it was left in the previous training. What I am trying to do is :

  1. loading my previously trained model from my computer and then re-train it from the point it ended it’s last training. For that, I am loading my previously saved model inside policy_fn() and I am giving policy_fn as parameter inside pposgd_simple.learn() method. It shows error "ValueError: At least two variables have the same name: pi/obfilter/count"

Also, I am unsure of whether it starts the training from the previous ending point or whether it started the training from the very beginning (when it trains correctly in a different setting). Can anyone please help me directing the way to verify it. One option may be printing the model parameters, but I am unsure of it.

  1. I am also trying to use Tensorboard to monitor my training. But when I run the training, the program says “tensorboard_log=logger_path, TypeError: learn() got an unexpected keyword argument 'tensorboard_log'.” My stable baselines version 2.10.2. I am attaching my entire code of training below. I would appreciate any suggestions from you. Thanks in advance.
def make_env(seed=None):
reward_scale = 1.0

rank = MPI.COMM_WORLD.Get_rank()
myseed = seed + 1000 * rank if seed is not None else None
set_global_seeds(myseed)
env = Env()


env = Monitor(env, logger_path, allow_early_resets=True)

env.seed(seed)
if reward_scale != 1.0:
from baselines.common.retro_wrappers import RewardScaler

env = RewardScaler(env, reward_scale)
return env


def train(num_timesteps, path=None):

from baselines.ppo1 import mlp_policy, pposgd_simple

sess = U.make_session(num_cpu=1)
sess.__enter__()

def policy_fn(name, ob_space, ac_space):
	policy = mlp_policy.MlpPolicy(name=name, ob_space=ob_space, ac_space=ac_space,
			hid_size=64, num_hid_layers=3)
	saver = tf.train.Saver()
	if path is not None:
		print("Tried to restore from ", path)
		U.initialize()
		saver.restore(tf.get_default_session(), path)
		saver2 = tf.train.import_meta_graph('/srcs/src/models/model1.meta')
		model = saver.restore(sess,tf.train.latest_checkpoint('/srcs/src/models/'))
		#return policy
		return saver2

env = make_env()


pi = pposgd_simple.learn(env, policy_fn,
max_timesteps=num_timesteps,
timesteps_per_actorbatch=1024,
clip_param=0.2, entcoeff=0.0,
optim_epochs=10,
optim_stepsize=5e-5,
optim_batchsize=64,
gamma=0.99,
lam=0.95,
schedule='linear',
tensorboard_log=logger_path,
#tensorboard_log="./ppo1_tensorboard/",
)
env.env.plotSave()
saver = tf.train.Saver(tf.all_variables())
saver.save(sess, '/models/model1') 
return pi


def main():
logger.configure()
path_ = "/models/model1" 
train(num_timesteps=409600, path=path_) 
if __name__ == '__main__':
rank = MPI.COMM_WORLD.Get_rank()
logger_path = None if logger.get_dir() is None else os.path.join(logger.get_dir(), str(rank))
main()
@Miffyli Miffyli added the question Further information is requested label Mar 13, 2022
@Miffyli
Copy link
Collaborator

Miffyli commented Mar 13, 2022

Seems like you are confusing OpenAI baselines with stable-baselines. In stable-baselines, you can save and restore models with simple agent.save and PPO.load functions. Stable-baselines does not have support for loading OpenAI baselines agents with a single call.

Also, we recommend using stable-baselines3 as it is more actively supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants