question about action encoded #222

Nightbringers · 2023-05-10T11:51:39Z

Search before asking

I have searched the MuZero issues and found no similar bug report.

🐛 Describe the bug

I'm confuse about this one: In muzero paper, the input to the dynamics function is the hidden state concatenated with a representation of the action for the transition. The problem is the code is different from paper describe which i understand, a normal action (playing a stone on the board) is encoded as an all zero plane, with a single one in the position of the played stone. For example, if action_space_size = 5, action =2, an action could encoded [0,1,0,0,0]. But in this code, the action is encoded [0.4,0.4,0.4,0.4,0.4],which is action/action_space_size.

I'm confuse about this place,am i misunderstanding? please tell me which one is right, and why write like this, thanks.

Add an example

action_one_hot = (
torch.ones(
(
encoded_state.shape[0],
1,
encoded_state.shape[2],
encoded_state.shape[3],
)
)
.to(action.device)
.float()
)
action_one_hot = (
action[:, :, None, None] * action_one_hot / self.action_space_size
)

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Nightbringers added the bug Something isn't working label May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about action encoded #222

question about action encoded #222

Nightbringers commented May 10, 2023

question about action encoded #222

question about action encoded #222

Comments

Nightbringers commented May 10, 2023

Search before asking

🐛 Describe the bug

Add an example

Environment

Minimal Reproducible Example

Additional