Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about action encoded #222

Open
1 task done
Nightbringers opened this issue May 10, 2023 · 0 comments
Open
1 task done

question about action encoded #222

Nightbringers opened this issue May 10, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@Nightbringers
Copy link

Search before asking

  • I have searched the MuZero issues and found no similar bug report.

馃悰 Describe the bug

I'm confuse about this one: In muzero paper, the input to the dynamics function is the hidden state concatenated with a representation of the action for the transition. The problem is the code is different from paper describe which i understand, a normal action (playing a stone on the board) is encoded as an all zero plane, with a single one in the position of the played stone. For example, if action_space_size = 5, action =2, an action could encoded [0,1,0,0,0]. But in this code, the action is encoded [0.4,0.4,0.4,0.4,0.4],which is action/action_space_size.

I'm confuse about this place,am i misunderstanding? please tell me which one is right, and why write like this, thanks.

Add an example

action_one_hot = (
torch.ones(
(
encoded_state.shape[0],
1,
encoded_state.shape[2],
encoded_state.shape[3],
)
)
.to(action.device)
.float()
)
action_one_hot = (
action[:, :, None, None] * action_one_hot / self.action_space_size
)

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

@Nightbringers Nightbringers added the bug Something isn't working label May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant