sampling in continuous/complex action spaces with 'density prior' is not working #200

ManorZ · 2022-07-09T20:24:33Z

Search before asking

I have searched the MuZero issues and found no similar bug report.

🐛 Describe the bug

In Learning and Planning in Complex Action Spaces (Hubert et al.), there are basically two changes compared to MuZero:

Modify the policy probabilities inside PUCB to be 'sampled policy' (pi_hat = beta_hat/beta * pi)
Sample K actions instead of evaluating all possible actions (infinity in the continuous case)

In the code, I think I see a difference:

No K samples are drawn at the root - only one.
Regarding pi_hat = beta_hat/beta * pi, I see two options there: 'uniform prior' and 'density prior'.

Uniform prior gives equal density to all actions, and weights the policy accordingly, and the current code makes sense.
Density prior needs to take care of each action CDF (at the parent), but it doesn't work (error message and description below).

Add an example

The error message I get:
File "/home/user_231/muzero-general/self_play.py", line 401, in ucb_score
child.prior / sum([child.prior for child in parent.children.values()])
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
Last test reward: 0.00. Training step: 0/3. Played games: 0. Loss: 0.00

This is because no one assigns node.prior in the continuous branch.
I think it has to be set by the parent in his expand method and to be equal to each child's CDF, at the sampled point.

Also, regarding the K, I think we need to make a small change in the expand method, and sample more than one action:

action_value = distribution.sample(K).squeeze(0).detach().cpu().numpy()
self.children[Action(action_value)] = Node()

Environment

No response

Minimal Reproducible Example

python muzero.py mujoco_IP {"node_prior":"density"}

Additional

No response

ManorZ added the bug Something isn't working label Jul 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling in continuous/complex action spaces with 'density prior' is not working #200

sampling in continuous/complex action spaces with 'density prior' is not working #200

ManorZ commented Jul 9, 2022

sampling in continuous/complex action spaces with 'density prior' is not working #200

sampling in continuous/complex action spaces with 'density prior' is not working #200

Comments

ManorZ commented Jul 9, 2022

Search before asking

🐛 Describe the bug

Add an example

Environment

Minimal Reproducible Example

Additional