Skip to content

How to access policy state with good train results? #711

Answered by Trinkle23897
Kiessar asked this question in Q&A
Discussion options

You must be logged in to vote

DQN's performance is largely affected by eps greedy. eps_test and eps_train are set to different values, so that's the reason for different performance between train and test.

best_reward always comes from a test. But if you are curious about "some pretty good result" in training, you can set test_in_train=True in the offpolicy trainer. This will freeze the policy, call test_episode to evaluate the policy once if it has an episodic reward that is above the given threshold.

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@Trinkle23897
Comment options

@Kiessar
Comment options

@Trinkle23897
Comment options

@Kiessar
Comment options

@Trinkle23897
Comment options

Answer selected by Kiessar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants