Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major environment refactoring (draft version) #166

Closed
wants to merge 28 commits into from
Closed

Conversation

cbhua
Copy link
Member

@cbhua cbhua commented Apr 25, 2024

Important

The merge of this pull request is postponed because it contains sensitive modifications to the environment logic, which may cause hidden bugs. We should be careful to update them. Therefore, this full version of environment refactoring will be kept as a draft. We opened another base version refactor pull request: #169, which only touches the environment structure and adds the generator without changing any logic for a safe refactor in the current state. In the future, we will based on this draft's full version, go further refactor environments step by step.

Description

Together with Major modeling refactoring #165, this PR is for major, long-due refactoring to the RL4CO environments codebase.

Motivation and Context

This refactoring is driven by following motivations:

  • New Feature Integration: We aim to support a data generator capable of producing various distributions for initialized instances.
  • Standardization of Environments: Before our environments were developed at different times, so there were inconsistencies in content, logic, and formatting. This refactoring tried to standardize these environments.
  • Code Cleanup: Our earlier versions included redundant code, functions, and calculation logic. This refactoring effort will clean up these elements, enhancing the codebase's readability and maintainability.

Changelog

Environment Structure Refactoring

The refactored structure for environments is as following:

rl4co
├── models/
└── envs/
    ├── eda/
    ├── scheduling/
    └── routing/
        ├── tsp/
        │   ├── env.py
        │   ├── generator.py
        │   └── render.py
        ├── cvrp/
        │   ├── env.py
        │   ├── generator.py
        │   └── render.py
        └── ...

We have restructured the organization of the environment files for improved modularity and clarity. Each environment has its own directory, comprising three components:

  • env.py: The core framework of the environment, managing functions such as _reset(), _step(), and others. For a comprehensive understanding, please refer to the documentation.
  • generator.py: Replace the previous generate_data() function; this module works for randomly initializing instances within the environment. The updated version now supports custom data distributions. See the following sections for more details.
  • render.py: For visualization of the solution. Its separation from the main environment file enhances overall code readability.

Data Generator Supporting

Each environment generator will be based on the base Generator() class with the following functions:

class Generator():
    def __init__(self, **kwargs):
        self.kwargs = kwargs

    def __call__(self, batch_size) -> TensorDict:
        batch_size = [batch_size] if isinstance(batch_size, int) else batch_size
        return self._generate(batch_size)

    def _generate(self, batch_size, **kwargs) -> TensorDict:
        raise NotImplementedError
  • __init_() will record all the environment instance initialize parameters, for example, num_loc, min_loc, max_loc, etc.

    Thus, you will see how the __init__() function for the environment (e.g. CVRPEnv.__init__(...)) only takes generator and generator_params as input. Now, the environment initialize example would be

    env = CVRPEnv(generator_params={num_loc=20})
    
    # Another way
    generator = CVRPGenerator(num_loc=20)
    env = CVRPEnv(generator)

    Various samplers will be initialized here. We provide the get_sampler() function to based on the input variables to return a torch.distributions class. By default, we support distributions Uniform, Normal, Exponential, and Poisson for locations and center, corner, for depots. You can also pass your won distribution sampler. See the following sections for more details.

  • __call__() is a middle wrapper; at the moment, it is used to regularize the batch_size format supported by the TorchRL (i.e., in a list format). Note that in this refactor version, we would finalize the dimension of batch_size to be 1 for easier implementation and clearer understanding since even multi-batch-size dimensions can be easily transferred to a single dimension.

  • __generate() is the part you would like to implement for your own environment data generator.

New get_sampler() function

This implementation mainly refers to @ngastzepeda's code. In the current version, we support the following distributions:

  • center: For depots. All depots will be initialized in the center of the space.
  • corner: For depots. All depots will be initialized in the bottom left corner of the space.
  • Uniform: Takes min_val and max_val as input.
  • Exponential and Poisson: Take mean_val and std_val as input.

You can also use your own Callable function as the sampler. This function will take the batch_size: List[int] as input and return the sampled torch.Tensor.

Modification for RL4COEnvBase()

We move the checking for batch_size and device from every environment to the base class for clarity, as shown in

def reset(self, td: Optional[TensorDict] = None, batch_size=None) -> TensorDict:
"""Reset function to call at the beginning of each episode"""
if batch_size is None:
batch_size = self.batch_size if td is None else td.batch_size
if td is None or td.is_empty():
td = self.generator(batch_size=batch_size)
batch_size = [batch_size] if isinstance(batch_size, int) else batch_size
self.to(td.device)
return super().reset(td, batch_size=batch_size)

We added a new _get_reward() function aside from the original get_reward() function and moved the check_solution_validity() from every environment to the base class for clarity, as shown in

def get_reward(self, td, actions) -> TensorDict:
"""Function to compute the reward. Can be called by the agent to compute the reward of the current state
This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
"""
if self.check_solution:
self.check_solution_validity(td, actions)
return self._get_reward(td, actions)
def _get_reward(self, td, actions) -> TensorDict:
"""Function to compute the reward. Can be called by the agent to compute the reward of the current state
This is faster than calling step() and getting the reward from the returned TensorDict at each time for CO tasks
"""
raise NotImplementedError

Standardization

We standardize the contents of env.py with the following functions:

class EnvName(RL4COEnvBase):
	name = "env_name"
	def __init__(self, generator: EnvGenerator, generator_params: dict): pass
	
	def _step(self, td: TensorDict) -> Tensordict: pass
	
	@staticmethod
	def get_action_mask(td: TensorDict) -> torch.Tensor: pass
	
	def _reset(self, td: Optional[TensorDict] = None, batch_size: Optional[list] = None) -> TensorDict: pass
	
	def _get_reward(self, td: TensorDict, actions: torch.Tensor) -> torch.Tensor: pass
	
	@staticmethod
	def check_solution_validity(td: TensorDict, actions: torch.Tensor) -> None: pass
	
	@staticmethod
	def render(td: TensorDict, actions: torch.Tensor = None, ax = None): pass
	
	def _make_spec(self, generator: EnvGenerator): pass

The order is considered to be natural and easy to follow, and we expected all environments to follow the same order for easier reference and matinees. In more detail, we have the following standardization:

  1. We changed the variable name available to visited for more intuitive understanding. In the step() and get_action_mask() calculation, visited records which nodes are visited, and the action_mask is based on it with environment constraints (e.g., capacity, time window, etc.). Separating these two variables would be clearer for the calculation logic.
  2. For some environments, change the _step() function to a nonstatic method. Follow the TorchRL style.
  3. Standardize the get_action_mask() calculation logic, which generally contains three parts: (a) initialize the action_mask based on visited; (b) update cities action_mask based on the state; (c) update the depot action_mask finally. Based on experience, this logic would cause fewer conflicts and mass.
  4. All 1-D features, e.g., i, capacity, used_capacity, etc., are initialized with the size of [*batch_size, 1] instead of [*batch_size, ]. The reason is that in many masking operations, we need to do logic calculations between this 1-D feature and 2-D features, e.g., capacity with demand. Also, stay consistent with TorchRL implementation.
  5. Rewrite comments on environments with descriptions of observations, constraints, finish conditions, rewards, and args so that a user can better understand the environment. Also, move data-related parameters (e.g., num_loc, min_loc, max_loc) to the generator for clarity.
  6. Add the cost variable to the get_reward function for an intuitive understanding. In this case, the return (reward) is -cost.

Other Fixes

  1. In CVRP, change the variable name vehicle_capacitycapacity, capacityunnorm_capacity to clarify.
  2. [⚠️ Sensitive Change] Now, the demand variable will also contain the depot. For example, in the previous CVRPEnv(), given num_loc=50, the td[”locs”] has the size of [batch_size, 51, 2] (with the depot), and the td[”demand”] has the size of [batch_size, 50, 2]. This causes index shifting in the get_action_mask() function, which requires a few padding operations.
  3. Fix the SDVRP environment action mask calculation bug.
  4. Adding numerical calculation error bound (01e-5), for example, in SDVRP done = ~(demand > 0).any(-1)done = ~(demand > 1e-5).any(-1) for better robustness to avoid edge cases.
  5. In CVRP, OP, and PCTSP environments, getting variables from tables with num_loc, e.g., CVRP CAPACITIES, if the given num_loc is not in the table, we will find the closest num_loc as replace and raise a warning to increase the running robustness.
  6. Fix the return type of get_reward().

Notes

  1. In The current version, we don’t support the distribution of int values, e.g., num_depot, num_agents. These values are initialized by torch.randint().
  2. In the reward calculation, for environments with the constraint starting and ending at the depot, actions should pad 0 to the start and end.
  3. In the current version, only routing environments have been refactored. We will also refactor the EDA and Scheduling environments soon.

Here is the summary of the refractory status for each environment:

  • Decompose: decompose environments into folder with env.py, generator.py, render.py; fix the __init__() and _reset() functions;
  • Training Checking: checking the training of refactored environments;
  • Documentation: cleanup and fix environment documents and logic comments;
  • Solution Validity: check if the environment contains a check_solution_validity() function;
  • Clean up Logic: check if the _step() and get_action_maks() function are cleaned up with the standard pipeline.
  Decompose Training Checking Documentation Solution Validity Clean up Logic
TSP
CVRP
CVRPTW  
PCTSP  
OP    
SDVRP
SVRP    
ATSP
MTSP    
SPCTSP  
PDP
MPDP    
MDCPDP      

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

Thanks, and need your help

Thanks for @ngastzepeda's base code for this refactoring!

If you have time, welcome to provide your ideas/feedback on this PR.
CC: @Furffico @henry-yeh @bokveizen @LTluttmann

There are quite a few remaining works for this PR, and I will actively update them here.

if batch_size is None:
batch_size = self.batch_size if td is None else td.batch_size
if td is None or td.is_empty():
td = self.generator(batch_size=batch_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we include the generator as a parameter in the base environment already and set it in __init__()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that we're calling it here (and further below), but in the base environment it doesn't actually exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the reason why they are here is because they could get passed to the environment itself, as in TorchRL here. This is the function signature for EnvBase in TorchRL:

    def __init__(
        self,
        *,
        device: DEVICE_TYPING = None,
        batch_size: Optional[torch.Size] = None,
        run_type_checks: bool = False,
        allow_done_after_reset: bool = False,
    ):
    ```
    So I guess we should make the above explicit in `RL4COEnvBase` since ours is a child class!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngastzepeda I think it makes sense that now all environments have the generator and generator_params as the input in the __init__() function. We could move them to the RL4COEnvBase().

And also as @fedebotu said, it's better to show other useful parameters from torchrl.EnvBase in our RL4COEnvBase(). Easier for users to know the provided APIs not only from the documentation.

Copy link
Member Author

@cbhua cbhua Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngastzepeda I rethink about adding generator and generator_params to the RL4COEnvBase class. I prefer to keep them for each environment at the current moment for two reasons:

  1. We want users can initialize the environment with simply calling env = <EnvName>(), e.g. env = TSPEnv() without any parameters. In this case, there is a required generator initialization for each environment with respective generator class, e.g.

    if generator is None:
    generator = CVRPGenerator(**generator_params)
    self.generator = generator

    It would be hard or at least massive for users to understand if we implement this part in the base class;

  2. Putting the generator initializing for each environment could be a hint for users to understand the "generate data" -> "reset instance as a tensordict" -> "step rollout, ..." pipeline.

What do you think about this? 🤔

rl4co/envs/common/utils.py Show resolved Hide resolved
Comment on lines +88 to +96
assert kwargs.get("mean_"+val_name, None) is not None, "mean is required for Normal distribution"
assert kwargs.get(val_name+"_std", None) is not None, "std is required for Normal distribution"
return Normal(mean=kwargs[val_name+"_mean"], std=kwargs[val_name+"_std"])
elif distribution == Exponential or distribution == "exponential":
assert kwargs.get(val_name+"_rate", None) is not None, "rate is required for Exponential/Poisson distribution"
return Exponential(rate=kwargs[val_name+"_rate"])
elif distribution == Poisson or distribution == "poisson":
assert kwargs.get(val_name+"_rate", None) is not None, "rate is required for Exponential/Poisson distribution"
return Poisson(rate=kwargs[val_name+"_rate"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly we're assuming a specific format for these parameters then (i.e. we expect parameters val_name_mean, val_name_std, etc.?) and it's not enough to simply pass f.e. mean = 5, std = 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Your understand is correct.I have thought about this for some time. The thing is: this get_sampler() function will be called in the generator multiple times for different features, e.g. in the OPGenerator()

self.depot_sampler = get_sampler("depot", depot_distribution, min_loc, max_loc, **kwargs)
self.prize_sampler = get_sampler("prize", prize_distribution, min_prize, max_prize, **kwargs)

If the user wants to init the location with a Normal distribution and the prize with a Poisson distribution, 3 parameters are required:

  1. The mean of location;
  2. The std of location;
  3. The rate of prize.

We have two options to consider these parameters in the OPGenerator():

  1. Adding all of them explicitly to the __init__() inputs;
def __init__(self, min_loc, max_loc, mean_loc, std_loc, rate_loc, loc_distribution,\
    min_prize, max_prize, mean_prize, std_prize, rate_prize, prize_distribution)
  1. Supported by the kwargs in the __init__() inputs, i.e. the user should follow the rule that: if you want to use the Normal distribution for <val_name>, you have to give extra parameters with exact the name mean_<val_name> and std_<val_name>.

Actually, both will work, but for clarity and flexibility, I chose the second way. However, I understand that this would be confusing for users, we should have a clear documentation for the standard rule for the parameter name.

If you have a better implementation, please tell me 🤔 I do think the current implementation may not be the optimal.

Copy link
Contributor

@ngastzepeda ngastzepeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented at first that instead of doing the same things in every single environment (specifically getting the visited,current_node, donetensors, etc.) we should define that in the parent class function such that the child classes can simply call the parent's class method. Then I noticed that the base class is for all environments, not just routing. Maybe it would make sense, though, to have a base class for routing (and one for scheduling, etc.), even within the base.py file, which would inherit from RL4COEnvBase and could define these things that are the same for all routing envs so we don't have to do the same thing multiple times...
Apart from that I only left a few minor comments :)

rl4co/envs/routing/atsp/env.py Outdated Show resolved Hide resolved
rl4co/envs/routing/atsp/env.py Show resolved Hide resolved
rl4co/envs/routing/atsp/env.py Show resolved Hide resolved
rl4co/envs/routing/atsp/generator.py Show resolved Hide resolved
rl4co/envs/routing/cvrp/env.py Outdated Show resolved Hide resolved
@@ -1,6 +1,6 @@
from rl4co.utils.pylogger import get_pylogger

from .pctsp import PCTSPEnv
from ..pctsp.env import PCTSPEnv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the only difference between this class and PCTSP seems to be that this one is stochastic, but there is no additional logic implemented to PCTSP, why even have two separate environments and not just differentiate via the stochasticboolean parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but conceptually they are a bit different, so it might be worth to keep the difference. Technically you could call PCTSP with the stochastic parameter on too

visited = td["visited"].scatter(
-1, current_node.expand_as(td["action_mask"]), 1
)
print(current_node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I had forgotten to delete the print statement^^

log = get_pylogger(__name__)


class SVRPEnv(RL4COEnvBase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we`re doing the refactoring anyways, we might as well rename this environment to SkillVRP to avoid confusion with the Stochastich VRP :)

log = get_pylogger(__name__)


class SVRPGenerator(Generator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to SkillVRPGenerator

log = get_pylogger(__name__)


def render(td, actions=None, ax=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit, I've never actually rendered a Skill VRP problem, so no idea if this runs without problems

Copy link
Member

@fedebotu fedebotu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!
Left some comments here and there. Additionally as @hyeok9855 is doing, there should be an additional (optional) file called local_search.py

dms[..., torch.arange(self.num_loc), torch.arange(self.num_loc)] = 0

log.info("Using TMAT class (triangle inequality): {}".format(self.tmat_class))
if self.tmat_class:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be inside of the sampler itself?

@@ -1,6 +1,6 @@
from rl4co.utils.pylogger import get_pylogger

from .pctsp import PCTSPEnv
from ..pctsp.env import PCTSPEnv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but conceptually they are a bit different, so it might be worth to keep the difference. Technically you could call PCTSP with the stochastic parameter on too

@@ -8,7 +8,8 @@
# Main autorergressive policy: rollout over multiple envs since it is the base
@pytest.mark.parametrize(
"env_name",
["tsp", "cvrp", "sdvrp", "mtsp", "op", "pctsp", "spctsp", "dpp", "mdpp", "smtwtp"],
# ["tsp", "cvrp", "sdvrp", "mtsp", "op", "pctsp", "spctsp", "dpp", "mdpp", "smtwtp"],
["tsp", "cvrp", "sdvrp", "mtsp", "op", "pctsp", "spctsp"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were tests from the above environments removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current refactoring version I just finished the routing environments part. Since we modified the RL4COEnvBase(), this will affect the running for EDA environments.

I will finish the refactoring for EDA environments in the coming commits, and put these checks back, don't worry.

@fedebotu fedebotu changed the title Major environment refactoring Major environment refactoring (draft version) Apr 30, 2024
@fedebotu
Copy link
Member

Let's remember also to fix the shifts in the torch.roll distance calculation as @ngastzepeda noticed, e.g. here. These do not affect calculations in euclidean problems, but it's best to have it conceptually correct

@fedebotu
Copy link
Member

fedebotu commented May 1, 2024

Notice that we moved most of the above in here #169 (without modification to environment logic or variables)! We will address the comments and merge soon~

@fedebotu
Copy link
Member

fedebotu commented Jun 7, 2024

There have been too many changes to track recently, and it seems that several features have already been added.

I will be closing this for now and come back to this for a fresh PR if needed!

@fedebotu fedebotu closed this Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants