Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Plotting Continued Models on the Same Line #364

Open
5 tasks done
Zachary-Fernandes opened this issue Mar 8, 2023 · 12 comments
Open
5 tasks done

[Question] Plotting Continued Models on the Same Line #364

Zachary-Fernandes opened this issue Mar 8, 2023 · 12 comments
Labels
question Further information is requested

Comments

@Zachary-Fernandes
Copy link

❓ Question

Hello, I have a question regarding plotting in rl-baselines3-zoo. I work on a cluster that limits runs to at most six hours, so I thought it would be a good idea to use checkpoints to save my runs. After I ran out of time, I scheduled a new job and continued from rl_model_9000000_steps.zip for another million steps, and this ran as expected.

However, two things occurred. First, the continued run's contents went into a different directory from the original run (DemonAttack-v4_2 instead of DemonAttack-v4_1). Second, when I tried to plot it with plot_train.py, it treated these directories as different runs.

How can I combine these two runs into one? My hope is to make the second run extend the first run as intended with checkpoints. Below is the plot made by plot_train.py.

image

Also attached are the contents for both DemonAttack-v4 directories.

image

image

Thank you in advance.

Checklist

@Zachary-Fernandes Zachary-Fernandes added the question Further information is requested label Mar 8, 2023
@araffin
Copy link
Member

araffin commented Mar 8, 2023

How can I combine these two runs into one? My hope is to make the second run extend the first run as intended with checkpoints.

For that you will need to combine the csv files and then put them in a single folder.
Or offset manually the second run in plot_train.py.

@Zachary-Fernandes
Copy link
Author

Thank you for the reply! Will these methods work with all_plots.py as well? I just wanted to make sure for plotting training and evaluation.
Also, how would one manually offset the second run within the script?

@araffin
Copy link
Member

araffin commented Mar 9, 2023

Will these methods work with all_plots.py as well? I just wanted to make sure for plotting training and evaluation.

For evaluation, you will need to merge the evaluations.npz files.

Also, how would one manually offset the second run within the script?

What did you try so far?

@Zachary-Fernandes
Copy link
Author

Zachary-Fernandes commented Mar 9, 2023

I tried looking through the plotting code where the npz files were used. I never used npz files before, but based on what I looked up, it appears evaluations.npz is a dictionary where each metric, such as "timesteps" and "results", is a key that leads to a list of values relating to that metric, such as timesteps or a list of rewards earned during an evaluation episode. If this is the case, merging these npz files should be similar to merging the csv files. What other properties do they have?

Speaking of which, I took your suggestion on combining the csv files and made a Google Colab notebook to perform this task. I tested it by merging three 100000-timestep runs of CartPole-v1 using A2C and PPO. I then used variant of plot_train.py to plot both algorithms' learning curves. Below is the end result. It works best with the default rolling window size. I will continue with this notebook for when I implementing the npz merging code.

image

@araffin
Copy link
Member

araffin commented Mar 9, 2023

made a Google Colab notebook to perform this task.

could you share the link as it might be useful for others?

it appears evaluations.npz is a dictionary where each metric, such as "timesteps" and "results", is a key that leads to a list of values relating to that metri

yes, it's a dictionary of numpy arrays.

@Zachary-Fernandes
Copy link
Author

Zachary-Fernandes commented Mar 9, 2023

Yes. I will admit this is a work in progress as I still need to implement npy merging. This was also made for the project I am working on, which uses a fork of rl-baselines3-zoo.

Here is the Google Colab link.

For evaluations.npz, is there a way to get all the keys?

@araffin
Copy link
Member

araffin commented Mar 9, 2023

For evaluations.npz, is there a way to get all the keys?

.keys()?

@Zachary-Fernandes
Copy link
Author

Thank you. I tried this and was curious why it would not print the keys, but I found a Stack Overflow post that stated .keys() returned an iterable. With a print statement, I can see the npz structure is similar to the csv. I should have it implemented later today.

@Zachary-Fernandes
Copy link
Author

Following up to this thread, I have a question regarding saving checkpoints in rl-baselines3-zoo. Say I want to train an agent with Asterix-v4 for 40 million timesteps. Within the six-hour limit, training can run through 10 million timesteps. When it concludes, three model zip files stand out to me: Asterix-v4.zip, best_model.zip, and rl_model_10000000_steps.zip.

What are the differences between these model files? If I wanted to resume training for another 10 million timesteps (with the aim being to eventually reach 40 million time steps), which of these would be the best to use?

I wanted to ask because when I resumed training with Asterix-v4.zip, it seemed like all of the models' rewards dropped after the first stop point (10 million iterations) almost like it resumed from an earlier point. Attached is an image demonstrating this alongside another showing a similar drop in Breakout-v4. With this in mind, I was curious if this worse performance could be attributed to which file I used to resumed.

image

image

@araffin
Copy link
Member

araffin commented Mar 11, 2023

What are the differences between these model files?

one is a checkpoint, this other is saved at the end of training, the last one is the best model according to the evaluation callback.

which of these would be the best to use?

Usually the one saved at the end of training, but that's not always true (for instance, if there is a performance drop just before the end of training).

t seemed like all of the models' rewards dropped after the first stop point

That's probably due to the fact you are using schedules and schedule are reset when resuming training.
See DLR-RM/stable-baselines3#435 and related issues.

@Zachary-Fernandes
Copy link
Author

Zachary-Fernandes commented Mar 11, 2023

I see. When you refer to schedules being reset when resuming training, is this whether reset_num_timesteps is set to True or False in model.learn()? I thought of this when looking at DLR-RM/stable-baselines3/issues/597, and I am wondering if setting reset_num_timesteps = False would be the only change that would need to be made. I imagine this would need to occur inside ExperimentManager's learn function.

I see the seed is also set to a constant. Does that factor into this as well? For context, the algorithms I am running are based on PPO.

@Zachary-Fernandes
Copy link
Author

Zachary-Fernandes commented Mar 16, 2023

Thank you for your advice concerning schedules. I modified some of the experiment manager code when running a trained agent, and it seemed to do the trick. Below is what the graphs currently look like with plotting across different checkpoints.

image

image

I have two more questions: when the PPO algorithm runs, does it have an offline component? How is it affected by checkpoints? I wanted to ask because when I ran MsPacman-v4, one of the PPO-based attention algorithms (DSH_SH) experienced a significant drop in score at around 7.5 million timesteps, and I was curious if this occurred because of the seed.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants