Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the perspective transformation of two players when calculating Q? #212

Open
puyuan1996 opened this issue Oct 25, 2022 · 0 comments

Comments

@puyuan1996
Copy link

puyuan1996 commented Oct 25, 2022

Thanks for you open-sourced code very much.

I am very confused about this code segment in backpropagate method in self_play.py:
when len(self.config.players) is 2,

  • in line 423
    min_max_stats.update(node.reward + self.config.discount * -node.value()),
    why we use -node.value()) rather than node.value()) here,
    in my understanding, node.value() is calculated from the perspective of the player corresponding to the node .

  • in line 425
    value = ( -node.reward if node.to_play == to_play else node.reward ) + self.config.discount * value
    when node.to_play == to_play is True, why we use -node.reward + self.config.discount * value rather than node.reward + self.config.discount * value here, ?

  • Is it because node.reward is obtained from the perspective of the parent node of the current node?

Looking forward to your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant