Skip to content
/ ReBRAC Public archive

Author's implementation of ReBRAC, a minimalist improvement upon TD3+BC

License

Notifications You must be signed in to change notification settings

tinkoff-ai/ReBRAC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Revisiting the Minimalist Approach to Offline Reinforcement Learning

Twitter arXiv

Method and Results Summary

Dependencies & Docker setup

To set up a python environment (with dev-tools of your taste, in our workflow, we use conda and python 3.8), just install all the requirements:

python3 install -r requirements.txt

However, in this setup, you must install mujoco210 binaries by hand. Sometimes this is not super straightforward, but this recipe can help:

mkdir -p /root/.mujoco \
    && wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz -O mujoco.tar.gz \
    && tar -xf mujoco.tar.gz -C /root/.mujoco \
    && rm mujoco.tar.gz
export LD_LIBRARY_PATH=/root/.mujoco/mujoco210/bin:${LD_LIBRARY_PATH}

You may also need to install additional dependencies for mujoco_py. We recommend following the official guide from mujoco_py.

Docker

We also provide a more straightforward way with a dockerfile that is already set up to work. All you have to do is build and run it :)

docker build -t rebrac .

To run, mount current directory:

docker run -it \
    --gpus=all \
    --rm \
    --volume "<PATH_TO_THE_REPO>:/workspace/" \
    --name rebrac \
    rebrac bash

V-D4RL

To reproduce V-D4RL, you need to download the corresponding datasets. The easiest way is probably to run the download_vd4rl.sh script we provide.

You can also do it manually with the following links to the datasets archives:

Note that provided links contain only datasets reported in the paper without distraction and multitasking.

After downloading the datasets, you must put the data into the vd4rl directory.

How to reproduce experiments

Training

Configs for the main experiments are stored in the configs/rebrac/<task_type> and configs/rebrac-vis/<task_type>. All available hyperparameters are listed in the rebrac/algorithms/rebrac.py for D4RL and rebrac/algorithms/rebrac_torch_vis.py for V-D4RL.

For example, to start ReBRAC training process with D4RL halfcheetah-medium-v2 dataset, run the following:

PYTHONPATH=. python3 src/algorithms/rebrac.py --config_path="configs/rebrac/halfcheetah/halfcheetah_medium.yaml"

For V-D4RL walker_walk-expert-v2 dataset, run the following:

PYTHONPATH=. python3 src/algorithms/rebrac_torch_vis.py --config_path="configs/rebrac-vis/walker_walk/expert.yaml"

Targeted Reproduction

For better transparency and replication, we release all the experiments (5k+) in the form of Weights & Biases reports.

If you want to replicate results from our work, you can use the configs for Weights & Biases Sweeps provided in the configs/sweeps. Note, we do not supply a codebase for both IQL and SAC-RND. However, in our work, we relied upon these implementations: IQL (CORL), SAC-RND (original implementation).

Paper element Sweeps to run from configs/sweeps/
Tables 2, 3, 4 eval/rebrac_d4rl_sweep.yaml, eval/td3_bc_d4rl_sweep.yaml
Table 5 eval/rebrac_visual_sweep.yaml
Table 6 All sweeps from ablations
Figure 2 All sweeps from network_sizes
Hyperparameters tuning All sweeps from tuning

Reliable Reports

We also provide scripts for reconstructing the graphs in our paper: eop/ReBRAC_ploting.ipynb, including performance profiles, probability of improvement, and expected online performance. For your convenience, we repacked the results into .pickle files, so you can re-use them for further research and head-to-head comparisons.

Citing

If you use this code for your research, please consider the following bibtex:

@article{tarasov2023revisiting,
  title={Revisiting the Minimalist Approach to Offline Reinforcement Learning},
  author={Denis Tarasov and Vladislav Kurenkov and Alexander Nikulin and Sergey Kolesnikov},
  journal={arXiv preprint arXiv:2305.09836},
  year={2023}
}