Skip to content
/ D4PG Public

PyTorch implementation of D4PG with the SOTA IQN Critic instead of C51. Implementation includes also the extensions Munchausen RL and D2RL which can be added to D4PG to improve its performance.

Notifications You must be signed in to change notification settings

BY571/D4PG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch implementation of D4PG

This repository contains a PyTorch implementation of D4PG with IQN as the improved distributional Critic instead of C51. Also the extentions Munchausen RL and D2RL are added and can be combined with D4PG as needed.

Dependencies

Trained and tested on:

Python 3.6
PyTorch 1.4.0  
Numpy 1.15.2 
gym 0.10.11 

How to use:

The new script combines all extensions and the add-ons can be simply added by setting the corresponding flags.

python run.py -info your_run_info

Parameter: To see the options: python run.py -h

Observe training results

tensorboard --logdir=runs

Added Extensions:

  • Prioritized Experience Replay [X]
  • N-Step Bootstrapping [X]
  • D2RL [X]
  • Distributional IQN Critic [X]
  • Munchausen RL [X]
  • Parallel-Environments [X]

Results

Environment: Pendulum

Pendulum

Below you can see how IQN reduced the variance of the Critic loss:

CriticLoss

Environment: LunarLander

LunarLander

Notes:

  • Performance depends a lot on good hyperparameter->> tau for Per bigger (pendulum 1e-2) for regular replay (1e-3)

  • BatchNorm had good impact on the overall performance (!)

About

PyTorch implementation of D4PG with the SOTA IQN Critic instead of C51. Implementation includes also the extensions Munchausen RL and D2RL which can be added to D4PG to improve its performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages