Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why might my rewards be inversely proportional to the target height in the HoverAviary environment? #214

Closed
gulbinkirikoglu opened this issue May 13, 2024 · 2 comments

Comments

@gulbinkirikoglu
Copy link

I'm attempting to conduct discrete-time RL training using the HoverAviary environment. My goal is to take the z position and z velocity within the observation space as input and control the drone's up and down movement (just [-1 -1 -1 -1] and [1 1 1 1] arrays). I'm using a defined reward function, but as the drone falls below a height of 1, the reward increases. What could be the reason for this?

@piratax007
Copy link

How have you defined the reward function? how does the code of the _compute_reward method?

@gulbinkirikoglu
Copy link
Author

Thank you for your help, I found the error. The issue wasn't with the reward function. I was treating the actions as discrete values, converting them to arrays, and storing them in the experience as arrays. Instead, I stored the discrete actions directly, and the problem was resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants