Skip to content

How to run the code and experiments

Stephanie Hughes edited this page Jun 24, 2021 · 19 revisions

Experiments are run using main_DDPGfD.py, which contains the main episode loop that controls the course of generating controller experience, conducting policy learning, and evaluating grasp trial performance.

Follow the outline in the How to run experiments to see how to determine the correct commands to use for running your experiment with python main_DDPGfD.py

Experiment run overview:

  1. Core command-line arguments
  2. Experiment output
  3. How to run experiments

Core command-line arguments:

--saving_dir: Saving directory name

--hand_orientation: Hand orientation about the x-axis; normal (0 deg.), rotated (68 deg.), top (90 deg.), random: randomly select from normal, rotated, or top orientation for each episode)

--shapes: Object(s) to use for the experiment, with one used per episode. The "shape" naming conversion is the Shape, followed by the size. For example, a small cube is referred to as "CubeS". For multiple shapes, enter them as: <shape>,<shape>,<shape>. Example: --shapes CubeS,CubeM,CubeB

--with_grasp_reward: (True/False) Determines whether or not the grasp classifier reward is used within the reward calculation; default is False

--with_orientation_noise: (True/False) Set to True to sample the initial hand-object coordinates from coordinate dataset WITH Hand Orientation Noise

--expert_replay_file_path: Path to the "expert" controller replay buffer; The path is determined by the controller name (naive, position-dependent, expert), with/no Hand Orientation Variation (HOV) (with_noise, no_noise), and with/no grasp classifier reward (with_grasp,no_grasp)

File structure: ./experiments/<controller_name>/<HOV>/<grasp_reward>/

Example: "./experiments/position-dependent/with_noise/no_grasp/"

--pretrain_policy_path: Path to the pre-trained policy (Ex: /experiments/pre-train/<pre-train_saving_dir>/policy/pre-train_DDPGfD_kinovaGrip)

--agent_replay_buffer_path: Path to the agent replay buffer (Pre-trained policy experience) (Ex: /experiments/pre-train/<pre-train_saving_dir>/replay_buffer/)

--max_episode: Maximum number of episodes to run

--controller_type: Determine the type of controller to use for evaluation (policy OR naive, position-dependent, combined)

--mode: Mode to run experiments with (naive, position-dependent, expert, pre-train, train, eval, experiment)

For generating controller experience, select: naive, position-dependent, or combined

For conducting policy learning, select: pre-train or train

To evaluate the controller or policy, select: eval

To generate an experiment using the bootstrapped-experiment pipeline (OLD), select: experiment (You will also need to give the --exp_num)

Experiment output:

The output of each experiment can be found within KinovaGrasping/gym-kinova-gripper/experiments/<mode>/<saving_dir>, where the <mode> is determined by the type of experiment you are running. If you produce multiple experiments with the same <saving_dir> name, each new experiment can be found in a date-stamped folder within <saving_dir>/.

Within KinovaGrasping/gym-kinova-gripper/experiments/<mode>/<saving_dir>, the following folders will be generated during from an experiment run:

/output: Contains plots, data used for plotting, and policies from each evaluation point (if running a learning experiment)

/output/boxplot/: Contains a boxplot displaying the distribution of reward values (finger, grasp, lift) over the course of the experiment

/output/heatmap/: Contains heatmaps of the success rate per each initial object coordinate position over the evaluated grasp trials

/output/results/: Contains the policy from each evaluation point -- this is the current policy at that point (Not the best so far)

/output/tensorboard/: Contains the tensorboard file; run --logdir . from this directory location to generate the tensorboard plot

/policy: Contains the final (best) policy over the course of learning (if running a learning experiment)

/replay_buffer: Contains the replay buffer as a set of numpy arrays (state, action, next_state, etc.) from the course of the experiment

How to run experiments

First, download the latest copy of the experiments/ folder from the Reinforcement Learning Box directory. Place the experiments/ folder within KinovaGrasping/gym-kinova-gripper/. The experiments folder contains the latest controller experience and policies needed to generate the following experiments.

Each of the following commands will run a specific experiment type. Please replace the <saving_dir> with your desired saving directory name.

Controller data generation

Constant-Speed Controller:

Variation Input (Baseline): Medium Cube, Normal (0 deg.) hand orientation, No Hand Orientation Variation (HOV)

python main_DDPGfD.py --saving_dir <saving_dir_name>--hand_orientation normal --shapes CubeM --with_grasp_reward False--with_orientation_noise False --max_episode 5000 --controller_type naive --mode naive

Variable-Speed Controller:

Variation Input (Baseline): Medium Cube, Normal (0 deg.) hand orientation, No Hand Orientation Variation (HOV)

python main_DDPGfD.py --saving_dir <saving_dir_name>--hand_orientation normal --shapes CubeM --with_grasp_reward False--with_orientation_noise False --max_episode 5000 --controller_type position-dependent --mode position-dependent

Pre-train

When pre-training the policy, you may determine the source of the expert experience through the --expert_replay_file_path.

Variation Input (Baseline): Medium Cube, Normal (0 deg.) hand orientation, No Hand Orientation Variation (HOV)

main_DDPGfD.py --saving_dir <saving_dir_name>--hand_orientation normal --shapes CubeM --with_orientation_noise False--expert_replay_file_path "./experiments/<controller_name>/with_noise/no_grasp/" --with_grasp_reward False --max_episode 3000 --controller_type policy --mode pre-train

Train

The following are examples of training the policy with each of the variation input types. For conducting a training experiment, you must determine the pre-trained policy, agent replay buffer, and expert replay buffer paths, otherwise they will be set to None.

Randomly-initialized agent Baseline + HOV:

Variation Input (Baseline + HOV): Medium Cube, Normal (0 deg.) hand orientation, With Hand Orientation Variation (HOV)

python main_DDPGfD.py --saving_dir <saving_dir_name>--hand_orientation normal --shapes CubeM --with_orientation_noise True --expert_prob 0--max_episode 10000 --controller_type policy --mode train

Baseline + HOV:

Variation Input (Baseline + HOV): Medium Cube, Normal (0 deg.) hand orientation, With Hand Orientation Variation (HOV)

Template: python main_DDPGfD.py --saving_dir <saving_dir_name>--hand_orientation normal --shapes CubeM --with_orientation_noise True--expert_replay_file_path "./experiments/<controller_name>/with_noise/no_grasp/" --agent_replay_buffer_path "./experiments/pre-train/<pre-train_saving_dir>/replay_buffer/" --pretrain_policy_path "./experiments/pre-train/<pre-train_saving_dir>/policy/pre-train_DDPGfD_kinovaGrip" --max_episode 10000 --controller_type policy --mode train

To recreate training with the Variable-Speed controller with the most current policy within the experiments/ folder: python main_DDPGfD.py --saving_dir Train_Baseline_HOV --hand_orientation normal --shapes CubeM --with_orientation_noise True --expert_replay_file_path "./experiments/position-dependent/with_noise/no_grasp/" --agent_replay_buffer_path "./experiments/pre-train/Pretrain_PD_Baseline_3k/replay_buffer/" --pretrain_policy_path "./experiments/pre-train/Pretrain_PD_Baseline_3k/policy/pre-train_DDPGfD_kinovaGrip" --max_episode 10000 --controller_type policy --mode train

Shapes + HOV:

Variation Input (Shapes + HOV): Medium Cube, Med. Cylinder, Med. Vase Normal (0 deg.) hand orientation, With Hand Orientation Variation (HOV)

python main_DDPGfD.py --saving_dir <saving_dir_name>--hand_orientation normal --shapes CubeM,CylinderM,Vase1M --with_orientation_noise True--expert_replay_file_path "./experiments/<controller_name>/with_noise/no_grasp/" --agent_replay_buffer_path "./experiments/pre-train/<pre-train_saving_dir>/replay_buffer/" --pretrain_policy_path "./experiments/pre-train/<pre-train_saving_dir>/policy/pre-train_DDPGfD_kinovaGrip" --max_episode 10000 --controller_type policy --mode train

Sizes + HOV:

Variation Input (Sizes + HOV): Small Cube, Med. Cube, Big Cube, Normal (0 deg.) hand orientation, With Hand Orientation Variation (HOV)

python main_DDPGfD.py --saving_dir <saving_dir_name>--hand_orientation normal --shapes CubeS,CubeM,CubeB --with_orientation_noise True--expert_replay_file_path "./experiments/<controller_name>/with_noise/no_grasp/" --agent_replay_buffer_path "./experiments/pre-train/<pre-train_saving_dir>/replay_buffer/" --pretrain_policy_path "./experiments/pre-train/<pre-train_saving_dir>/policy/pre-train_DDPGfD_kinovaGrip" --max_episode 10000 --controller_type policy --mode train

Orientations + HOV:

Variation Input (Orientations + HOV): Medium Cube, Random (randomly select from normal 0 deg., rotated 68 deg., or top 90 deg.) hand orientation, With Hand Orientation Variation (HOV)

python main_DDPGfD.py --saving_dir <saving_dir_name>--hand_orientation random --shapes CubeM --with_orientation_noise True--expert_replay_file_path "./experiments/<controller_name>/with_noise/no_grasp/" --agent_replay_buffer_path "./experiments/pre-train/<pre-train_saving_dir>/replay_buffer/" --pretrain_policy_path "./experiments/pre-train/<pre-train_saving_dir>/policy/pre-train_DDPGfD_kinovaGrip" --max_episode 10000 --controller_type policy --mode train

Evaluation