Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Add JSSP environment #177

Merged
merged 41 commits into from
Jun 3, 2024
Merged

[Feat] Add JSSP environment #177

merged 41 commits into from
Jun 3, 2024

Conversation

LTluttmann
Copy link
Contributor

Description

  • Added environment for the Job-Shop Scheduling Problem (JSSP). This implementation of JSSP treats JSSP as a special case of FJSP, where each operation can be processed by only one machine. As such, the environment is implemented as subclass of the FJSPEnv, only changing the action space (action space reduces to the selection of next job to execute) and the data generator.
  • In addition, the HetGNN policy has been restructured and renamed to L2D (for learning to dispatch) and is now applicable to both, FJSP and JSSP.

Motivation and Context

  • JSSP is a common CO problem and a widely used problem to benchmark new algorithms

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@LTluttmann LTluttmann requested review from cbhua and fedebotu May 16, 2024 08:35
from .generator import JSSPFileGenerator, JSSPGenerator


class JSSPEnv(FJSPEnv):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I love that the code is being reused in such as smart way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesss, when see this inherit: wow nice.

@@ -19,7 +19,7 @@
from rl4co.models.rl.ppo.ppo import PPO
from rl4co.models.rl.reinforce.baselines import REINFORCEBaseline, get_reinforce_baseline
from rl4co.models.rl.reinforce.reinforce import REINFORCE
from rl4co.models.zoo import HetGNNModel
from rl4co.models.zoo import L2DModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good - let's make sure the baselines have their names! L2D is a very influential paper in the NCO community

@@ -73,7 +73,8 @@ def gather_by_index(src, idx, dim=1, squeeze=True):
expanded_shape = list(src.shape)
expanded_shape[dim] = -1
idx = idx.view(idx.shape + (1,) * (src.dim() - idx.dim())).expand(expanded_shape)
return src.gather(dim, idx).squeeze() if squeeze else src.gather(dim, idx)
squeeze = idx.size(dim) == 1 and squeeze
return src.gather(dim, idx).squeeze(dim) if squeeze else src.gather(dim, idx)


def unbatchify_and_gather(x: Tensor, idx: Tensor, n: int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] it might be faster if you put a @torch.jit decorator. Not 100% sure though

@fedebotu
Copy link
Member

@Junyoungpark tagging you since you know this problem very well!

Do you think we have a chance to include ScheduleNet?

Copy link
Member

@cbhua cbhua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

# update adjacency matrices (remove edges)
td["proc_times"] = td["proc_times"].scatter(
2,
selected_op[:, None, None].expand(-1, self.num_mas, 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor, Enhancement] Using einops.repeat could be "slightly" more efficient 😁:

repeat(selected_op, 'b -> b n d', n=self.num_mas, d=1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it though? I think if you know already the dimensions, einops is slightly slower from what I know, but take this with a grain of salt. But I agree that it's more readable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I did a trial and actually einops.repeat is way slower than tensor.expand 😂 (in large scale around 4x slower). Then I think it's good to keep using the tensor.expand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think they use very clever optimizations in torch.expand() 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] Maybe we don't need this file for clean file structure.

from .generator import JSSPFileGenerator, JSSPGenerator


class JSSPEnv(FJSPEnv):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesss, when see this inherit: wow nice.

# self-loop is added by GCNConv layer
return get_full_graph_edge_index(td.device, num_nodes, self_loop=False)


class GCNEncoder(nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this clean refactoring, the logic is clearer. But will the get_full_graph_edge_index() be called at every forward step? i.e. in the previous version, if it's a fully connected connected graph, the edge_index will be saved as class variable, instead of regenerating every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats true, but the result is cached so it should not be too slow. In fact, this implementation should be much faster as before (at least it was in my experiments), because it avoids the list comprehension over the batch data within the forward pass. But I agree, its still not optimal; I will revisit this in the near future

@LTluttmann
Copy link
Contributor Author

Thanks for reviewing guys. I will add a ton of changes here in a couple of minutes and I hope this PR will not get too messy. Let me know if we should go through it together.

Additional changes:

  • stepwise PPO for L2D
  • Some MatNet changes to better work for JSSP / FJSP
  • Running mean / variance class for reward / advantage scaling
  • stepwise L2D policy
  • attention based models for fjsp / jssp
  • minor bugfixes and improvements here and there

@fedebotu
Copy link
Member

fedebotu commented Jun 1, 2024

Wow, lots of changes here! 😁 Really curious about episodic / stepwise RL performances

Btw, feel free to merge anytime

@fedebotu fedebotu removed the request for review from Junyoungpark June 1, 2024 07:17
# NOTE Experimental TSP class for stepwise PPO


class TSPEnv4PPO(TSPEnv):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor] this may be called DenseRewardTSPEnv or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Btw., Stepwise PPO for TSP indeed converges to the nearest neighbor heuristic, at least with the stepwise reward as it is defined here (the distance added by the action):

image

Do you have a preferrence as to how to call the stepwise PPO in the paper (dense, stepwise something else)? And then we should probably adjust the description about PPO in the appendix

@@ -58,7 +57,7 @@ def __init__(
generator_params: dict = {},
**kwargs,
):
super().__init__(**kwargs)
super().__init__(check_solution=False, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always the case (no solution check)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm I think this is there just bc there is no check implemented for FFSP yet haha. Let me see if I can get one implemented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, this is not a pressing issue! Actually, it's even better to keep it to False during training for efficiency

@LTluttmann LTluttmann merged commit f4abe1b into main Jun 3, 2024
26 checks passed
@fedebotu
Copy link
Member

fedebotu commented Jun 3, 2024

Great job!! This PR is truly huge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants