[Question] Plotting Continued Models on the Same Line #364

Zachary-Fernandes · 2023-03-08T16:40:07Z

❓ Question

Hello, I have a question regarding plotting in rl-baselines3-zoo. I work on a cluster that limits runs to at most six hours, so I thought it would be a good idea to use checkpoints to save my runs. After I ran out of time, I scheduled a new job and continued from rl_model_9000000_steps.zip for another million steps, and this ran as expected.

However, two things occurred. First, the continued run's contents went into a different directory from the original run (DemonAttack-v4_2 instead of DemonAttack-v4_1). Second, when I tried to plot it with plot_train.py, it treated these directories as different runs.

How can I combine these two runs into one? My hope is to make the second run extend the first run as intended with checkpoints. Below is the plot made by plot_train.py.

Also attached are the contents for both DemonAttack-v4 directories.

Thank you in advance.

Checklist

I have checked that there is no similar issue in the repo
I have read the SB3 documentation
I have read the RL Zoo README
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin · 2023-03-08T16:49:08Z

How can I combine these two runs into one? My hope is to make the second run extend the first run as intended with checkpoints.

For that you will need to combine the csv files and then put them in a single folder.
Or offset manually the second run in plot_train.py.

Zachary-Fernandes · 2023-03-08T16:56:36Z

Thank you for the reply! Will these methods work with all_plots.py as well? I just wanted to make sure for plotting training and evaluation.
Also, how would one manually offset the second run within the script?

araffin · 2023-03-09T12:37:45Z

Will these methods work with all_plots.py as well? I just wanted to make sure for plotting training and evaluation.

For evaluation, you will need to merge the evaluations.npz files.

Also, how would one manually offset the second run within the script?

What did you try so far?

Zachary-Fernandes · 2023-03-09T14:07:42Z

I tried looking through the plotting code where the npz files were used. I never used npz files before, but based on what I looked up, it appears evaluations.npz is a dictionary where each metric, such as "timesteps" and "results", is a key that leads to a list of values relating to that metric, such as timesteps or a list of rewards earned during an evaluation episode. If this is the case, merging these npz files should be similar to merging the csv files. What other properties do they have?

Speaking of which, I took your suggestion on combining the csv files and made a Google Colab notebook to perform this task. I tested it by merging three 100000-timestep runs of CartPole-v1 using A2C and PPO. I then used variant of plot_train.py to plot both algorithms' learning curves. Below is the end result. It works best with the default rolling window size. I will continue with this notebook for when I implementing the npz merging code.

araffin · 2023-03-09T14:12:30Z

made a Google Colab notebook to perform this task.

could you share the link as it might be useful for others?

it appears evaluations.npz is a dictionary where each metric, such as "timesteps" and "results", is a key that leads to a list of values relating to that metri

yes, it's a dictionary of numpy arrays.

Zachary-Fernandes · 2023-03-09T14:27:05Z

Yes. I will admit this is a work in progress as I still need to implement npy merging. This was also made for the project I am working on, which uses a fork of rl-baselines3-zoo.

Here is the Google Colab link.

For evaluations.npz, is there a way to get all the keys?

araffin · 2023-03-09T14:28:42Z

For evaluations.npz, is there a way to get all the keys?

.keys()?

Zachary-Fernandes · 2023-03-09T14:41:44Z

Thank you. I tried this and was curious why it would not print the keys, but I found a Stack Overflow post that stated .keys() returned an iterable. With a print statement, I can see the npz structure is similar to the csv. I should have it implemented later today.

Zachary-Fernandes · 2023-03-11T08:13:57Z

Following up to this thread, I have a question regarding saving checkpoints in rl-baselines3-zoo. Say I want to train an agent with Asterix-v4 for 40 million timesteps. Within the six-hour limit, training can run through 10 million timesteps. When it concludes, three model zip files stand out to me: Asterix-v4.zip, best_model.zip, and rl_model_10000000_steps.zip.

What are the differences between these model files? If I wanted to resume training for another 10 million timesteps (with the aim being to eventually reach 40 million time steps), which of these would be the best to use?

I wanted to ask because when I resumed training with Asterix-v4.zip, it seemed like all of the models' rewards dropped after the first stop point (10 million iterations) almost like it resumed from an earlier point. Attached is an image demonstrating this alongside another showing a similar drop in Breakout-v4. With this in mind, I was curious if this worse performance could be attributed to which file I used to resumed.

araffin · 2023-03-11T14:22:56Z

What are the differences between these model files?

one is a checkpoint, this other is saved at the end of training, the last one is the best model according to the evaluation callback.

which of these would be the best to use?

Usually the one saved at the end of training, but that's not always true (for instance, if there is a performance drop just before the end of training).

t seemed like all of the models' rewards dropped after the first stop point

That's probably due to the fact you are using schedules and schedule are reset when resuming training.
See DLR-RM/stable-baselines3#435 and related issues.

Zachary-Fernandes · 2023-03-11T15:21:08Z

I see. When you refer to schedules being reset when resuming training, is this whether reset_num_timesteps is set to True or False in model.learn()? I thought of this when looking at DLR-RM/stable-baselines3/issues/597, and I am wondering if setting reset_num_timesteps = False would be the only change that would need to be made. I imagine this would need to occur inside ExperimentManager's learn function.

I see the seed is also set to a constant. Does that factor into this as well? For context, the algorithms I am running are based on PPO.

Zachary-Fernandes · 2023-03-16T13:38:51Z

Thank you for your advice concerning schedules. I modified some of the experiment manager code when running a trained agent, and it seemed to do the trick. Below is what the graphs currently look like with plotting across different checkpoints.

I have two more questions: when the PPO algorithm runs, does it have an offline component? How is it affected by checkpoints? I wanted to ask because when I ran MsPacman-v4, one of the PPO-based attention algorithms (DSH_SH) experienced a significant drop in score at around 7.5 million timesteps, and I was curious if this occurred because of the seed.

Zachary-Fernandes added the question Further information is requested label Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Plotting Continued Models on the Same Line #364

[Question] Plotting Continued Models on the Same Line #364

Zachary-Fernandes commented Mar 8, 2023

araffin commented Mar 8, 2023

Zachary-Fernandes commented Mar 8, 2023

araffin commented Mar 9, 2023

Zachary-Fernandes commented Mar 9, 2023 •

edited

araffin commented Mar 9, 2023

Zachary-Fernandes commented Mar 9, 2023 •

edited

araffin commented Mar 9, 2023

Zachary-Fernandes commented Mar 9, 2023

Zachary-Fernandes commented Mar 11, 2023

araffin commented Mar 11, 2023

Zachary-Fernandes commented Mar 11, 2023 •

edited

Zachary-Fernandes commented Mar 16, 2023 •

edited

[Question] Plotting Continued Models on the Same Line #364

[Question] Plotting Continued Models on the Same Line #364

Comments

Zachary-Fernandes commented Mar 8, 2023

❓ Question

Checklist

araffin commented Mar 8, 2023

Zachary-Fernandes commented Mar 8, 2023

araffin commented Mar 9, 2023

Zachary-Fernandes commented Mar 9, 2023 • edited

araffin commented Mar 9, 2023

Zachary-Fernandes commented Mar 9, 2023 • edited

araffin commented Mar 9, 2023

Zachary-Fernandes commented Mar 9, 2023

Zachary-Fernandes commented Mar 11, 2023

araffin commented Mar 11, 2023

Zachary-Fernandes commented Mar 11, 2023 • edited

Zachary-Fernandes commented Mar 16, 2023 • edited

Zachary-Fernandes commented Mar 9, 2023 •

edited

Zachary-Fernandes commented Mar 9, 2023 •

edited

Zachary-Fernandes commented Mar 11, 2023 •

edited

Zachary-Fernandes commented Mar 16, 2023 •

edited