-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Adding support for improvement method #174
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job 🚀
Left some comments~
rl4co/models/nn/ops.py
Outdated
def forward(self, x): | ||
if isinstance(self.normalizer, nn.BatchNorm1d): | ||
return self.normalizer(x.view(-1, x.size(-1))).view(*x.size()) | ||
elif isinstance(self.normalizer, nn.InstanceNorm1d): | ||
return self.normalizer(x.permute(0, 2, 1)).permute(0, 2, 1) | ||
elif self.normalizer == 'layer': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we initialize the LayerNorm from PyTorch instead?
Also @cbhua @LTluttmann it would be a good idea to allow for different normalizations - if user passes str
, we should be able to recover the type from PyTorch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky!! Basically the idea is exactly the LayerNorm. But the PyTorch LayerNorm requires pre-defining the shape of the tensors to norm, so in our case, we need to know the graph_size (the mean and var are computed wrt both graph_size and embed_dim). So i write my own normalisation. Also, the API for LayerNorm is different from other, e.g., it uses args like elementwise_affine rather than affine
rl4co/utils/decoding.py
Outdated
self.actions.append(selected_action) | ||
self.logprobs.append(logprobs) | ||
return td | ||
# skip this step for improvement methods, since the action for improvement methods is finalized in its own policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Internal comment] @cbhua @LTluttmann
this may be needed in other occasions as well. We may want to standardize the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, maybe we can rename this step()
-> _step()
and adding a step()
wrapper outside. The _step()
calculates the logprobs, selected_actions
and step()
selects variables to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with what @cbhua suggested. We can do it! Currently, I have not performed any refactoring here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option might be to define another non-internal function like select_action()
which only returns logprobs, selected_actions
. And the step()
method additionally adds the action to the internal list and the tensordict. Might be more readable then having yet another wrapper with a step()
func ^.^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! 🚀 Played with the code and it works pretty efficiently! Loveit.
Oops, I should start a review firstly and then comment. I made quite a few separate comments. Hope it not to be so massive. 🤪
Hi @cbhua @fedebotu , thank you so much for the review and great suggestions! I have replied above. Since last time I forgot to perform the pre-commit so I did a forced re-commit of the files. Sorry if it looks a bit hard to track the changes. I will perform new commits for future updates! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work and great addition to rl4co!
rl4co/utils/decoding.py
Outdated
self.actions.append(selected_action) | ||
self.logprobs.append(logprobs) | ||
return td | ||
# skip this step for improvement methods, since the action for improvement methods is finalized in its own policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option might be to define another non-internal function like select_action()
which only returns logprobs, selected_actions
. And the step()
method additionally adds the action to the internal list and the tensordict. Might be more readable then having yet another wrapper with a step()
func ^.^
Hi @LTluttmann, thanks for the review! I have changed the codes in the latest commit! I marked this pull request as draft since I need to add more features before merging~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job !
I went through the code and I don't have particular comments since you mentioned this is working well, except in the future we should maybe document a little more the new classes (but not now)
Two comments:
|
||
class MLP(nn.Module): | ||
def __init__( | ||
self, | ||
input_dim: int, | ||
output_dim: int, | ||
num_neurons: List[int] = [64, 32], | ||
dropout_probs: Union[None, List[float]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Minor] Note that we could also use the MLP from TorchRL, but no need to change now since here we can add more custom stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fedebotu, thanks for the comments! I have added the tests for N2S and resolved the conflicts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! 🚀 Left some random comments. Really minor if the model is properly working 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Description
This PR is to make RL4CO support the improvement method for VRPs. The changes includes:
Motivation and Context
This PR is to make RL4CO support the improvement methods.
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!