Skip to content

Video Explanations and Tutorials

Stephanie Hughes edited this page Jun 28, 2021 · 20 revisions

This page contains videos explaining the concepts in the training process and tutorials on how to run portions of the code.

Recommended to replay these videos at 1.5x speed.

Topics:

  • Input/Ouput of experiments
  • Episode Loop Structure (Grasping + Lifting Stages)
  • Variable Speed Controller example run

This video goes through how we conduct our episode loop (grasping + lifting stages) through an example using an example of the Variable-Speed controller. It also takes a look at how we get our action from our controllers through the get_action() function located in expert_data.py. Through this tutorial, we are also able to see how the code starts from the command line input, sets up the directory structure for each run, and finally generates the saved output (coordinates and plots). Note: At the end of the video, the recommended python editor is Pycharm (not Pytorch lol!)

Example command: python main_DDPGfD.py --saving_dir Short_run_Variable_Speed_Controller --hand_orientation normal --shapes CubeM --with_orientation_noise False --controller_type position-dependent --max_episode 5 --mode position-dependent

Topics:

  • DDPGfD algorithm
  • Actor/Critic network setup
  • Behavior Cloning loss

This video compares the DDPGfD algorithm to our current implementation. It also discusses some of the additions we have made (Behavior Cloning loss).

Topics:

  • Sampling from the replay buffer (agent and expert)
  • Sampling N-steps
  • Updating the policy

This video goes through the diagram that displays our current process for how we train the policy. Training the policy includes: sampling experience (trajectories) from the replay buffer and updating the network weights based on minimizing the loss between the target and current actor-critic networks.

Topics:

  • RL Training Pipeline (Controller-->Pre-training-->Training)
  • Variation input (Baseline, Baseline + HOV, Shapes + HOV, Sizes + HOV, Orientations + HOV)
  • Changing the command-line input based on the experiment type (Controller/Policy, Variation input)

This video steps through the training pipeline starting with the controllers, followed by pre-training, and finally training the policy with varied inputs. For each stage, we take a look at example commands that can be used to generate experiments within the training pipeline.

Topics:

  • Base controller versus policy performance
  • Examine policy performance over the course of learning (Pre-trained policy)

This video goes through how we go about examining the controller performance with the policy performance, specifically the pre-trained policy. When analyzing the policy performance, we examine the learning curves (avg. reward, critic loss, actor loss) over the course of the training period. For examining grasping performance, we take a look at the areas of success (heatmaps) and the finger velocities (velocity plots and renderings) to compare the controllers versus the policy.

The references to the slides from this meeting can be found here:

Topics:

  • General overview of the evaluation process
  • How to evaluate grasping strategies of the trained policies over the learning period

This video continues from the "Analyzing Training Pipeline Results -- Pre-training" to give a general overview of how we evaluate the grasping performance of the trained policies. This includes how we use both the output from training the policy (learning curves) and from evaluating the policy (heatmaps) with the grasp trial success rate to determine how the performance of the policy changes over time. This performance is examined per each Variation Input type (Baseline, Baseline + HOV, Shapes + HOV, etc.) to then determine patterns in grasping strategies.

Note: this video is not up to date with the most recent results, so this just gives an overview of our evaluation strategy.

The references to the slides from this meeting can be found here: