Video Explanations and Tutorials

This page contains videos explaining the concepts in the training process and tutorials on how to run portions of the code.

Recommended to replay these videos at 1.5x speed.

Video: Conducting an episode

Topics:

Input/Ouput of experiments
Episode Loop Structure (Grasping + Lifting Stages)
Variable Speed Controller example run

This video goes through how we conduct our episode loop (grasping + lifting stages) through an example using an example of the Variable-Speed controller. It also takes a look at how we get our action from our controllers through the get_action() function located in expert_data.py. Through this tutorial, we are also able to see how the code starts from the command line input, sets up the directory structure for each run, and finally generates the saved output (coordinates and plots). Note: At the end of the video, the recommended python editor is Pycharm (not Pytorch lol!)

Example command: python main_DDPGfD.py --saving_dir Short_run_Variable_Speed_Controller --hand_orientation normal --shapes CubeM --with_orientation_noise False --controller_type position-dependent --max_episode 5 --mode position-dependent

Video: DDPGfD Overview

Topics:

DDPGfD algorithm
Actor/Critic network setup
Behavior Cloning loss

This video compares the DDPGfD algorithm to our current implementation. It also discusses some of the additions we have made (Behavior Cloning loss).

The references to the slides and papers from this meeting can be found here:

DDPGfD Presentation Slides

DDPGfD Research Paper - N-Step rollout is referenced on page 3

Behavior Cloning Loss Presentation Slides

Behavior Cloning Loss Research Paper

Video: Training the Policy (Sample and Update -- Conceptual Diagram)

Topics:

Sampling from the replay buffer (agent and expert)
Sampling N-steps
Updating the policy

This video goes through the diagram that displays our current process for how we train the policy. Training the policy includes: sampling experience (trajectories) from the replay buffer and updating the network weights based on minimizing the loss between the target and current actor-critic networks.

Miro Board (with original diagram)

Video: Training Pipeline + How to run each stage

Topics:

RL Training Pipeline (Controller-->Pre-training-->Training)
Variation input (Baseline, Baseline + HOV, Shapes + HOV, Sizes + HOV, Orientations + HOV)
Changing the command-line input based on the experiment type (Controller/Policy, Variation input)

This video steps through the training pipeline starting with the controllers, followed by pre-training, and finally training the policy with varied inputs. For each stage, we take a look at example commands that can be used to generate experiments within the training pipeline.

RL Training Pipeline + Example commands Presentation Slides

Video: Analyzing Training Pipeline Results -- Pre-training

Topics:

Base controller versus policy performance
Examine policy performance over the course of learning (Pre-trained policy)

This video goes through how we go about examining the controller performance with the policy performance, specifically the pre-trained policy. When analyzing the policy performance, we examine the learning curves (avg. reward, critic loss, actor loss) over the course of the training period. For examining grasping performance, we take a look at the areas of success (heatmaps) and the finger velocities (velocity plots and renderings) to compare the controllers versus the policy.

The references to the slides from this meeting can be found here:

Video: Evaluating the Trained Policies -- General Overview

Topics:

General overview of the evaluation process
How to evaluate grasping strategies of the trained policies over the learning period

This video continues from the "Analyzing Training Pipeline Results -- Pre-training" to give a general overview of how we evaluate the grasping performance of the trained policies. This includes how we use both the output from training the policy (learning curves) and from evaluating the policy (heatmaps) with the grasp trial success rate to determine how the performance of the policy changes over time. This performance is examined per each Variation Input type (Baseline, Baseline + HOV, Shapes + HOV, etc.) to then determine patterns in grasping strategies.

Note: this video is not up to date with the most recent results, so this just gives an overview of our evaluation strategy.

The references to the slides from this meeting can be found here:

Evaluation Plan + Results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video Explanations and Tutorials

Video: Conducting an episode

Video: DDPGfD Overview

Video: Training the Policy (Sample and Update -- Conceptual Diagram)

Video: Training Pipeline + How to run each stage

Video: Analyzing Training Pipeline Results -- Pre-training

Video: Evaluating the Trained Policies -- General Overview

Clone this wiki locally