Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions and modifications to greedy_coloring and equitable_coloring #7423

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Rhyd-Lewis
Copy link

@Rhyd-Lewis Rhyd-Lewis commented Apr 18, 2024

Summary of changes. The following files have been changed. These are considered below, with further details

…\networkx\networkx\algorithms\coloring\equitable_colouring.py

As before, this new code provides an equitable k-colouring for the nodes of a graph G (where k is specified by the user via the parameter “num_colors”). The original version of this code only works when the number of colours k (defined by the user) exceeds the maximum degree of G. In our new version, a heuristic algorithm is added that is used in cases where k is less than this value. This allows more flexibility for users. This heuristic algorithm is described and documented in my book:

Lewis, R. (2021) A Guide to Graph Colouring: Algorithms and Applications, 2nd Ed. Springer, ISBN: 978-3-030-81053-5 https://link.springer.com/book/10.1007/978-3-030-81054-2.

The implementation of this new algorithm has complexity O((n log n) + (nk) + (m log m)). As required, in solutions returned by this method, neighbouring vertices always receive different colours; however, the colouring is not guaranteed to be equitable.

Several new exceptions have now also been added to this code to reflect these changes and to make the existing code safer.

  • NetworkXAlgorithmError: If a value of k is used that is too small (that is, the heuristic algorithm is unable to colour all nodes of G such that neighbours always have different colours
  • NetworkXNotImplemente: If G is a directed graph or directed multigraph.
  • ValueError: If k is not a positive integer.

The documentation and comments have been modified to reflect these changes. All other parts of the code remain the same, albeit with some tidying.

…\networkx\networkx\algorithms\coloring\greedy_colouring.py

Updates have also been made to the above file to improve performance and simplify the methods used for graph colouring. The main changes are:

  • the deletion of the existing “interchange” operator, and the addition of a new tabu search operator
  • the implementation of two high-performance constructive heuristics that are known to be exact on some topologies and produce good results on arbitrary graphs,
  • an improvement in the documentation and more information of algorithm complexity.
  • Additional exceptions: in particular, these algorithms should not be used with directed graphs.

Full descriptions and analyses of these algorithms can also be found in my book:

Lewis, R. (2021) A Guide to Graph Colouring: Algorithms and Applications, 2nd Ed. Springer, ISBN: 978-3-030-81053-5 https://link.springer.com/book/10.1007/978-3-030-81054-2.

A concise description of the new tabu search algorithm is given in the comments/documentation. It is a well-regarded contemporary method that is known to produce results close to the state of the art. Here are its main benefits.

  • The tabu search algorithm occupies fewer lines of code -- approximately 80 lines as opposed to interchange's 180.
  • The tabu search algorithm can be used as an additional step with all the greedy heuristics currently included in the NetworkX library. In contrast, the interchange method does not work with DSatur, despite this being one of the best available greedy heuristics.
  • The tabu search algorithm allows users to specify the maximum number of iterations. Better results (but longer run times) will occur with larger iteration limits.
  • Compared to all existing colouring methods in the NetworkX library, the tabu search option produces solutions with fewer colours than interchange, and in less time. The following document contains further details on this: https://www.rhydlewis.eu/gcol/analysis.pdf.

The second change we have made is in the implementation if the Dsatur algorithm in NetworkX, where there is currently a slight error. In the original specification of Dsatur (given in the following paper):

Brélaz, D. (1979). "New methods to color the vertices of a graph". Communications of the ACM. 22(4), pp. 251–256.

the next vertex to be coloured should be the one with the highest saturation degree, breaking ties by selecting the vertex with the highest degree in the subgraph induced by the uncoloured vertices. However, the current NetworkX implementation breaks ties by only looking at the degree of the vertex in the original graph. This fault is corrected in our proposed implementation. Our new implementation also makes use of a priority queue and has a complexity of O((n+m) lg(n+m)) as opposed to the existing implementation of O(n^2). This new implementation occupies fewer lines of code and is faster for all but the densest of graphs. A further analysis of this algorithm (performance and complexity) is available in the above book.

Finally, we have also added the famous recursive largest first (RLF) algorithm as an option for the greedy colouring method. This is a well-known and high-performing method that has a complexity of O(nm). This is slightly more expensive than the existing greedy methods, although it is also known to produce better quality solutions. (Information on this can also be seen in the above book and at https://www.rhydlewis.eu/gcol/analysis.pdf.)

…\networkx\networkx\algorithms\coloring\tests\test_coloring.py

In the above file, several new tests have been added to ensure our new code functions as expected. We can confirm that is correct. Additional information on run times and performance can also be seen in https://www.rhydlewis.eu/gcol/analysis.pdf

- Use of more contemporary algorithms for graph colouring
- Improved documentation
- Improved performance and simplified code for several existing methods
Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For starters, it'd be good to ensure that any of the tests that are currently in place haven't been removed or modified. This is important in order to ensure that the changes are backward-compatible.

It's hard to tell from the diff whether there are material changes or whether these are just style issues, so the first step is resolving the style problems. A quick recipe:

pip install pre-commit
pre-commit install
pre-commit run --all-files

See the contributor guide for further details!

@Rhyd-Lewis
Copy link
Author

Rhyd-Lewis commented Apr 23, 2024 via email

@rossbar
Copy link
Contributor

rossbar commented Apr 23, 2024

No worries, @Rhyd-Lewis - and thanks for the proposal!

As mentioned above, the first step will be to fix the style issues: at this point, it's not clear to reviewers how extensive the changes to the test suite actually are, as the modifications are intertwined with style/reformatting changes. Once those are rectified (see procedure in the previous comment, or the contributor guide it will be more straightforward to tell whether there are any backwards-incompatible changes and to what degree (ha!).

At a higher level - the reason why this constraint exists is that these functions are already in use by users for all sorts of applications. It's important to ensure that any changes that go into the library don't silently yank the rug out from under them and change the behavior of existing code, at least not without some form of forewarning.

…te that test_coloring.py had been altered because (a) the new version allows some paramter combinations that were previously disallowed, and (b) additional parameter options are also available for greedy_coloring and equitable_coloring. I hope that this is OK

At a higher level - the reason why this constraint exists is that these functions are already in use by users for all sorts of applications. It's important to ensure that any changes that go into the library don't silently yank the rug out from under them and change the behavior of existing code, at least not without some form of forewarning.
@Rhyd-Lewis
Copy link
Author

Rhyd-Lewis commented Apr 24, 2024 via email

@Rhyd-Lewis
Copy link
Author

Hello. All tests (except for reviewer label) have now been passed. Looking forward to receiving any feedback. Thanks
Rhyd

@dschult
Copy link
Member

dschult commented May 8, 2024

I took a look at the PR and I think it is changing the interface in a way I find confusing at first glance. Perhaps I am missing something -- or maybe we just need to make a new function instead of changing an existing function. Making a new function could also make this easier to review -- (it looks like almost every line has changed, so it's hard to tell what this does to t he existing function).

My questions:

  • this seems to provide a heuristic method that (sometimes) does not return an equitable coloring. That seems confusing given the name of the function.
  • this seems to add a few new strategies for coloring -- but the diff makes it look like you are also changing/replacing an existing strategy. Can you make the additions without removing any existing strategies? Even if the strategy is essentially the same as an existing one, it is probably better to add a new one and leave the old so people counting on it can continue to use it.

minor nit: can you use log(m) instead of lg m for the documentation of complexity? push back if that's not ok.

@Rhyd-Lewis
Copy link
Author

Rhyd-Lewis commented May 9, 2024 via email

@Rhyd-Lewis
Copy link
Author

Hi All,

Awaiting feedback on most recent commit.

Thanks

Rhyd

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably the easiest path forward here to to try to decompose the changes into multiple PRs - see comments below. The changes generally look good, but they are extensive and should undergo careful review! Most of my suggestions at this stage are about making that review easier!

else:
r_ = 0
# Employ the exact algorithm of [1]. First, map nodes to integers for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the else is no longer necessary - getting rid of it and resetting the indent level from L531 onward would be a big help I think!


The strategies are described in [1]_, and smallest-last is based on
[2]_.
def greedy_color(G, strategy="largest_first", tabu=0, interchange=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new kwarg should be added at the end of the signature in order to prevent breaking changes in existing code which rely on positional arguments. See the contributor guide for further details if interested!


class TestColoring:
def test_basic_cases(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test suite is certainly hard to follow and the efforts to improve it are much appreciated! However, with the removal/reorganization of tests it becomes very difficult to evaluate whether the changes to the code are not breaking tested cases.

The best way to push this forward IMO would be to split out test suite modifications (i.e. changes and removals) to a separate PR that is dedicated to refactor the test suite. New tests can (and should) be added to this PR, but removing/changing existing tests should be left off until after the proposed changes can be shown to pass the existing test suite as is (plus any new tests).

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments on the equitable_coloring.py portion of the PR.
For the greedy coloring (as for the others too) we need to keep the old behavior -- including default behavior for any existing functions. At least through a deprecation cycle. I haven't worked through the implications of that because it looks like a lot has been removed that perhaps doesn't need to be. It would be helpful if you passed through the PR with a view toward backward compatibility and made it more clear what is changed and how users get the old behavior. Introducing new functions can be part of that -- you don't have to rewrite existing functions.

Thanks very much...

[Edit: Sorry -- I didn't see @rossbar's comments before I posted this.]

Comment on lines +514 to +515
c = equitable_heuristic(G, num_colors)
return c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace this with the single line:
return equitable_heuristic(G, num_colors)

Then, since this is a "return" you can remove the else: line and dedent the rest of the code back to its original indentation level. That should help with the review too since only the changes (instead of indented) lines will show in github's diff.

Comment on lines +497 to +501
(2010). A fast algorithm for equitable coloring. Combinatorica, 30(2),
217-224.
.. [2] Lewis, R. (2021) A Guide to Graph Colouring: Algorithms and
Applications, 2nd Ed. Springer, ISBN: 978-3-030-81053-5
<https://link.springer.com/book/10.1007/978-3-030-81054-2>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor nit: The indentation should be 4 characters even if that doesn't match the first line's .. [1] indentation.

Comment on lines +504 to +505
if nx.is_directed(G):
raise nx.NetworkXNotImplemented("input graph cannot be directed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is code that is run by the decorator not_implemented_for just above the function signature.
The use of that decorator makes this not necessary.

"""
if nx.is_directed(G):
raise nx.NetworkXNotImplemented("input graph cannot be directed.")
if not isinstance(num_colors, int) or num_colors < 0:
Copy link
Member

@dschult dschult May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the type of the input as int means integer values that are not of Python int type wont work as input. One way to handle this is to use if int(num_colors) != num_colors .... But there are many other methods -- including not checking for it and seeing if the resulting exception is sufficiently clear to not need an explicit error here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example in networkx that checks whether an input is integral:

if isinstance(n, numbers.Integral):

This works with numpy integers:

>>> isinstance(np.int64(5), int)
False

>>> isinstance(np.int64(5), numbers.Integral)
True

I think this works b/c numpy registers the int type here.

NumPy uses a different method to check for integers, such as:

import operator
try:
    operator.index(maybe_integer)
except TypeError:
    raise TypeError('blah blah') from None

NumPy uses this trick lots of places, such as here:
https://github.com/numpy/numpy/blob/9142f32573612d544b65649cf6f485f55b48e97a/numpy/lib/_npyio_impl.py#L822-L828
This works b/c integral objects are those that implement __index__.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just as soon remove the isinstance check entirely - the docstring is very clear that num_colors has to be a positive integer, and a very clear exception would be raised immediately without this check:

>>> G = nx.dodecahedral_graph()
>>> nx.equitable_coloring(G, "n")
Traceback (most recent call last)
   ...
File ~/repos/networkx/networkx/algorithms/coloring/equitable_coloring.py:510, in equitable_color(G, num_colors)
    507 # Calculate maximum degree in G
    508 r_ = max(G.degree(node) for node in G.nodes) if len(G.nodes) > 0 else 0
--> 510 if num_colors <= r_:
    511     # Employ the heuristic from [2]
    512     c = equitable_heuristic(G, num_colors)
    513     return c

TypeError: '<=' not supported between instances of 'str' and 'int'

@Rhyd-Lewis
Copy link
Author

Rhyd-Lewis commented May 25, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants