Correct handling of `termination` vs `truncation`? #457

ankile · 2024-04-17T12:26:49Z

Hi, thank you so much for the CleanRL resource!

I have a question regarding the PPO implementation and how it handles the difference between episodes that ended because it was terminated (it completed the task) or truncated (it ran out of time).

A comment in the advantage calculation suggests that episodes that are not done are to be bootstrapped from the value function.

At the same time, both truncations and terminations are or'd together so both cases are counted as the same type of done:

cleanrl/cleanrl/ppo_continuous_action.py

Line 221 in 8cbca61

next_done = np.logical_or(terminations, truncations)

This seems to go against other findings/implementations: Time Limits in Reinforcement Learning, StableBaselines3.

Is the difference here that you assume that we're operating in environments with an actual episode timeout so that truncations mean failure? In other cases, there is no inherent sense of time-limit, only a designer desire for faster task solving, in which I think it makes sense to handle truncations separately.

Have I understood all of this correctly?

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2024-04-17T13:38:07Z

I believe this is being fixed here - #448

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct handling of `termination` vs `truncation`? #457

Correct handling of `termination` vs `truncation`? #457

ankile commented Apr 17, 2024

pseudo-rnd-thoughts commented Apr 17, 2024

Correct handling of termination vs truncation? #457

Correct handling of termination vs truncation? #457

Comments

ankile commented Apr 17, 2024

pseudo-rnd-thoughts commented Apr 17, 2024

Correct handling of `termination` vs `truncation`? #457

Correct handling of `termination` vs `truncation`? #457