[Feat] Adding support for improvement method #174

yining043 · 2024-05-12T12:34:08Z

Description

This PR is to make RL4CO support the improvement method for VRPs. The changes includes:

Add new features for improvement methods:
- improvement env classes
- improvement base model, encoder, and decoder classes
- Positional encoding/embedding support
Minor changes to augment existing features
- nn.attention, nn.ops
- decoding strategy
- a new type of normalization method
ADD the N2S implementation
modify the PPO training part for improvement methods

Motivation and Context

This PR is to make RL4CO support the improvement methods.

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly (to do).

fedebotu

Great job 🚀

Left some comments~

rl4co/envs/routing/pdp/env.py

rl4co/envs/routing/pdp/generator.py

rl4co/models/common/improvement/base.py

fedebotu · 2024-05-12T12:55:37Z

rl4co/models/nn/ops.py

    def forward(self, x):
        if isinstance(self.normalizer, nn.BatchNorm1d):
            return self.normalizer(x.view(-1, x.size(-1))).view(*x.size())
        elif isinstance(self.normalizer, nn.InstanceNorm1d):
            return self.normalizer(x.permute(0, 2, 1)).permute(0, 2, 1)
+        elif self.normalizer == 'layer':


Can't we initialize the LayerNorm from PyTorch instead?

Also @cbhua @LTluttmann it would be a good idea to allow for different normalizations - if user passes str, we should be able to recover the type from PyTorch

This is tricky!! Basically the idea is exactly the LayerNorm. But the PyTorch LayerNorm requires pre-defining the shape of the tensors to norm, so in our case, we need to know the graph_size (the mean and var are computed wrt both graph_size and embed_dim). So i write my own normalisation. Also, the API for LayerNorm is different from other, e.g., it uses args like elementwise_affine rather than affine

rl4co/models/nn/pos_embeddings.py

rl4co/models/zoo/n2s/encoder.py

fedebotu · 2024-05-12T13:03:03Z

rl4co/utils/decoding.py

-        self.actions.append(selected_action)
-        self.logprobs.append(logprobs)
-        return td
+        # skip this step for improvement methods, since the action for improvement methods is finalized in its own policy


[Internal comment] @cbhua @LTluttmann
this may be needed in other occasions as well. We may want to standardize the API

Agree, maybe we can rename this step() -> _step() and adding a step() wrapper outside. The _step() calculates the logprobs, selected_actions and step() selects variables to return.

Yeah, I agree with what @cbhua suggested. We can do it! Currently, I have not performed any refactoring here for now.

another option might be to define another non-internal function like select_action() which only returns logprobs, selected_actions. And the step() method additionally adds the action to the internal list and the tensordict. Might be more readable then having yet another wrapper with a step()func ^.^

rl4co/envs/routing/pdp/env.py

rl4co/models/nn/improvement_attention.py

cbhua

Awesome work! 🚀 Played with the code and it works pretty efficiently! Loveit.

Oops, I should start a review firstly and then comment. I made quite a few separate comments. Hope it not to be so massive. 🤪

yining043 · 2024-05-13T13:12:13Z

Hi @cbhua @fedebotu , thank you so much for the review and great suggestions! I have replied above. Since last time I forgot to perform the pre-commit so I did a forced re-commit of the files. Sorry if it looks a bit hard to track the changes. I will perform new commits for future updates! :)

LTluttmann

awesome work and great addition to rl4co!

rl4co/models/nn/pos_embeddings.py

LTluttmann · 2024-05-13T13:08:28Z

rl4co/utils/decoding.py

-        self.actions.append(selected_action)
-        self.logprobs.append(logprobs)
-        return td
+        # skip this step for improvement methods, since the action for improvement methods is finalized in its own policy


another option might be to define another non-internal function like select_action() which only returns logprobs, selected_actions. And the step() method additionally adds the action to the internal list and the tensordict. Might be more readable then having yet another wrapper with a step()func ^.^

rl4co/models/nn/env_embeddings/init.py

rl4co/models/nn/improvement_attention.py

yining043 · 2024-05-13T14:49:27Z

Hi @LTluttmann, thanks for the review! I have changed the codes in the latest commit! I marked this pull request as draft since I need to add more features before merging~

rl4co/models/nn/ops.py

rl4co/models/nn/pos_embeddings.py

fedebotu

Great job !

I went through the code and I don't have particular comments since you mentioned this is working well, except in the future we should maybe document a little more the new classes (but not now)

Two comments:

slightly more pressing matter:
Could you make a simple test like this so that RL4CO can automatically detect if there are any problems?
I noticed that there are some conflicts that do not allow merging, this can be solved like this, let me know if you need help!

fedebotu · 2024-05-27T14:37:04Z

rl4co/models/nn/mlp.py


 class MLP(nn.Module):
    def __init__(
        self,
        input_dim: int,
        output_dim: int,
        num_neurons: List[int] = [64, 32],
+        dropout_probs: Union[None, List[float]] = None,


[Minor] Note that we could also use the MLP from TorchRL, but no need to change now since here we can add more custom stuff

Hi @fedebotu, thanks for the comments! I have added the tests for N2S and resolved the conflicts!

cbhua

Great work! 🚀 Left some random comments. Really minor if the model is properly working 😂

rl4co/envs/routing/pdp/env.py

fedebotu

🚀

cbhua

🚀

yining043 changed the title ~~Adding support for improvement method~~ Adding support for improvement method (draft version) May 12, 2024

fedebotu requested review from fedebotu, cbhua and LTluttmann and removed request for fedebotu May 12, 2024 12:38

fedebotu reviewed May 12, 2024

View reviewed changes