Reward of drawer open may lead algorithm stuck at sub optiaml policy. #480

XZDang13 · 2024-05-18T16:17:07Z

The reward function currently assigns a higher reward when the gripper is merely against the handle compared to when it is actually hooking the handle. Additionally, the reward when the gripper is opening the drawer is similar to when it is against the handle. This can lead the algorithm to favor a policy where the gripper stays against the handle rather than correctly hooking and pulling it without applying tricks to the RL algorithm.

reginald-mclean · 2024-05-22T20:14:46Z

Sure, but it's very unlikely that every reward function is going to be 100% perfect such that the agent always solves the task. I'm not quite sure if this is an issue, I am quite confident there are similar instances of this in many reward functions

reginald-mclean closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward of drawer open may lead algorithm stuck at sub optiaml policy. #480

Reward of drawer open may lead algorithm stuck at sub optiaml policy. #480

XZDang13 commented May 18, 2024

reginald-mclean commented May 22, 2024

Reward of drawer open may lead algorithm stuck at sub optiaml policy. #480

Reward of drawer open may lead algorithm stuck at sub optiaml policy. #480

Comments

XZDang13 commented May 18, 2024

reginald-mclean commented May 22, 2024