Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reward of drawer open may lead algorithm stuck at sub optiaml policy. #480

Closed
XZDang13 opened this issue May 18, 2024 · 1 comment
Closed

Comments

@XZDang13
Copy link

The reward function currently assigns a higher reward when the gripper is merely against the handle compared to when it is actually hooking the handle. Additionally, the reward when the gripper is opening the drawer is similar to when it is against the handle. This can lead the algorithm to favor a policy where the gripper stays against the handle rather than correctly hooking and pulling it without applying tricks to the RL algorithm.
hook
open
reach

@reginald-mclean
Copy link
Collaborator

Sure, but it's very unlikely that every reward function is going to be 100% perfect such that the agent always solves the task. I'm not quite sure if this is an issue, I am quite confident there are similar instances of this in many reward functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants