Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] exp_manager reward and GAE discount factors #442

Open
5 tasks done
edmund735 opened this issue Apr 4, 2024 · 1 comment
Open
5 tasks done

[Question] exp_manager reward and GAE discount factors #442

edmund735 opened this issue Apr 4, 2024 · 1 comment
Labels
question Further information is requested

Comments

@edmund735
Copy link

❓ Question

Hi,

I noticed that the algorithm discount factor and reward discount factor are set to be the same in lines 365-367 in rl_zoo3/exp_manager.py

# Use the same discount factor as for the algorithm. if "gamma" in hyperparams: self.normalize_kwargs["gamma"] = hyperparams["gamma"]

For PPO, does this mean the discount factor used for GAE is the same as the reward discount factor? I'm currently training an environment with PPO that is episodic (episodes always reach a termination state by 100 time steps, so there is never truncation) and I'd like to have a reward discount factor of 1. In this case, if I want to do hyperparameter tuning for the GAE discount factor, should I remove this line so that the VecNormalize object created uses a discount factor of 1 (different from the GAE discount factor)? Also, why are the discount factors matched only if normalize =True (isn't it possible that you still have a reward discount factor without normalization)? I read #64 ("gamma is the only one we override automatically for correctness (and only if present in the hyperparameters)" and don't think I understand what "correctness" means in this case. Any further explanation would be very helpful.

Thanks!

Checklist

@edmund735 edmund735 added the question Further information is requested label Apr 4, 2024
@araffin
Copy link
Member

araffin commented Apr 8, 2024

For PPO, does this mean the discount factor used for GAE is the same as the reward discount factor?

I guess there is confusion between GAE and what that piece of code do.
This code snippet is only about VecNormalize and the way it normalizes the reward (there are many issues in SB2/SB3 repo about why it is like that).

GAE uses two hyperparameters, gamma the discount factor and gae_lambda which does the tradeoff between TD(0) and MC estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants