Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep Q-value network evaluation in SAC algorithm #1166

Open
moizuet opened this issue Jul 19, 2022 · 2 comments
Open

Deep Q-value network evaluation in SAC algorithm #1166

moizuet opened this issue Jul 19, 2022 · 2 comments
Labels
question Further information is requested

Comments

@moizuet
Copy link

moizuet commented Jul 19, 2022

I am implementing Soft-Actor Critic (SAC) agent and need to evaluate q-value network inside my custom environment (for the implementation of a special algorithm, called Wolpertinger's algorithm, to handle large discrete action spaces). I have tried to get the q-values from SAC class object, but failed. Any method or function like the one with stable baselines' PPO algorithm's implementation (namely, .value) will be very helpful.

@moizuet moizuet changed the title Deep Q-value network evaluation Deep Q-value network evaluation in SAC algorithm Jul 19, 2022
@Miffyli Miffyli added the question Further information is requested label Jul 20, 2022
@Miffyli
Copy link
Collaborator

Miffyli commented Jul 20, 2022

I would first suggest moving using stable-baselines3: it is more refined and still mantained. This version is no longer mantained.

To answer your question: there is no convenience function for this, but you can check how SAC does the value prediction in SB3 here, and try to replicate it yourself.

@moizuet
Copy link
Author

moizuet commented Jul 21, 2022

Unfortunately I have implemented rest of RL algorithms, layers and optimizers in tensorflow and stable-baselines2 ecosystem. I cannot switch right now but I will consider using stable-baselines3 and specially Rllib in the future.

Also it will be a great coding exercise for me to implement this q-value evaluation method.

Cheers..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants