sampling in continuous/complex action spaces with 'density prior' is not working #200
Open
1 task done
Labels
bug
Something isn't working
Search before asking
馃悰 Describe the bug
In Learning and Planning in Complex Action Spaces (Hubert et al.), there are basically two changes compared to MuZero:
In the code, I think I see a difference:
Add an example
The error message I get:
File "/home/user_231/muzero-general/self_play.py", line 401, in ucb_score
child.prior / sum([child.prior for child in parent.children.values()])
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
Last test reward: 0.00. Training step: 0/3. Played games: 0. Loss: 0.00
This is because no one assigns node.prior in the continuous branch.
I think it has to be set by the parent in his expand method and to be equal to each child's CDF, at the sampled point.
Also, regarding the K, I think we need to make a small change in the expand method, and sample more than one action:
action_value = distribution.sample(K).squeeze(0).detach().cpu().numpy()
self.children[Action(action_value)] = Node()
Environment
No response
Minimal Reproducible Example
python muzero.py mujoco_IP {"node_prior":"density"}
Additional
No response
The text was updated successfully, but these errors were encountered: