Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add corner case to Degree Pearson Correlation Coefficient #7392

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

vanshika230
Copy link
Contributor

@vanshika230 vanshika230 commented Apr 7, 2024

Closes #6913

  1. Add a corner case to test for empty graphs.
  2. Add raises section in docs for empty graph.
  3. Added test for the code added. The coverage is at 100%.
    Please do let me know any changes to be made. Thankyou!

Copy link
Contributor

@Schefflera-Arboricola Schefflera-Arboricola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(related corner case) Right now for a Graph with 1 nodes and 1 edges, nx.degree_pearson_correlation_coefficient(G) gives ValueError: x and y must have length at least 2.

>>> G = nx.Graph()
>>> G.add_node(1)
>>> G.add_edge(1,1)
>>> nx.degree_pearson_correlation_coefficient(G)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<class 'networkx.utils.decorators.argmap'> compilation 4", line 3, in argmap_degree_pearson_correlation_coefficient_1
  File "/workspaces/networkx/networkx/utils/backends.py", line 633, in __call__
    return self.orig_func(*args, **kwargs)
  File "/workspaces/networkx/networkx/algorithms/assortativity/correlation.py", line 168, in degree_pearson_correlation_coefficient
    return float(sp.stats.pearsonr(x, y)[0])
  File "/home/codespace/.local/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 4727, in pearsonr
    raise ValueError('x and y must have length at least 2.')
ValueError: x and y must have length at least 2.

@vanshika230
Copy link
Contributor Author

vanshika230 commented Apr 7, 2024

(related corner case) Right now for a Graph with 1 nodes and 1 edges, nx.degree_pearson_correlation_coefficient(G) gives ValueError: x and y must have length at least 2.

>>> G = nx.Graph()
>>> G.add_node(1)
>>> G.add_edge(1,1)
>>> nx.degree_pearson_correlation_coefficient(G)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<class 'networkx.utils.decorators.argmap'> compilation 4", line 3, in argmap_degree_pearson_correlation_coefficient_1
  File "/workspaces/networkx/networkx/utils/backends.py", line 633, in __call__
    return self.orig_func(*args, **kwargs)
  File "/workspaces/networkx/networkx/algorithms/assortativity/correlation.py", line 168, in degree_pearson_correlation_coefficient
    return float(sp.stats.pearsonr(x, y)[0])
  File "/home/codespace/.local/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 4727, in pearsonr
    raise ValueError('x and y must have length at least 2.')
ValueError: x and y must have length at least 2.

I'll add for this, thankyou so much! What are your thoughts on adding NetworkXUnfeasible if len(x) or len(y) is less than 2? Any other idea is welcome.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vanshika230 and @Schefflera-Arboricola for the review! I agree with @Schefflera-Arboricola that the additional corner cases should also be added.

Also, as a general comment, let's go ahead and switch all the new exceptions to NetworkXError - the other, more specific, exceptions (NetworkXPointlessConcept and NetworkXUnfeasible) have specialized meanings that don't necessarily map cleanly to these cases IMO!

@vanshika230
Copy link
Contributor Author

Thanks @vanshika230 and @Schefflera-Arboricola for the review! I agree with @Schefflera-Arboricola that the additional corner cases should also be added.

Also, as a general comment, let's go ahead and switch all the new exceptions to NetworkXError - the other, more specific, exceptions (NetworkXPointlessConcept and NetworkXUnfeasible) have specialized meanings that don't necessarily map cleanly to these cases IMO!

Noted, I have added the corner case told by @Schefflera-Arboricola , thankyou for that! Please let me know any further changes.

Copy link
Contributor

@Schefflera-Arboricola Schefflera-Arboricola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vanshika230 ! I've left a few comments below. LMK what u think.

networkx/algorithms/assortativity/correlation.py Outdated Show resolved Hide resolved
networkx/algorithms/assortativity/correlation.py Outdated Show resolved Hide resolved
@vanshika230
Copy link
Contributor Author

Thanks @vanshika230 ! I've left a few comments below. LMK what u think.

I have made the changes suggested in the review, Thankyou!

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As often happens, I think this one is a bit trickier than it appears at first glance!

Comment on lines +163 to +166
if nx.is_empty(G):
raise nx.NetworkXError(
"Cannot compute correlation coefficient on an empty graph."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming my reasoning in the above comment is correct (please double-check!), I think the real condition that we want to catch here is that the graph with self-loop edges removed is not empty. One way to check that is by comparing the edge set to the set of self-loop edges, e.g. set(G.edges) - set(nx.selfloop_edges(G)).

However, this will also lead to subtle changes in the algorithm for the case where the graph with multiple nodes that only contain self-loop edges. Right now, this case gives:

>>> G = nx.Graph([(1, 1), (2, 2), (3, 3)])
>>> nx.degree_pearson_correlation_coefficient(G)
ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
nan

whereas after the change this would raise an exception instead of producing the nan. This seems reasonable to me, but I'm no expert!

------
NetworkXError
If the graph 'G' is empty.
If the graph 'G' has nodes with degrees less than two.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the condition that triggers the exception - nodes with degree less than 2 are fine (otherwise the path_graph(4) example in the docstring would fail, since the end nodes have degree 1!). IIUC, the corner case is related to graphs that have only a single self-loop edge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the corner case is related to graphs that have only a single self-loop edge.

we also get ValueError: x and y must have length at least 2. for 1 edge DiGraphs. And there probably might be some other cases too where the len(x) < 2 or len(y) < 2 condition might be true.

import networkx as nx
G = nx.DiGraph([(1, 2)])
G1 = nx.Graph(G) 
print(nx.degree_pearson_correlation_coefficient(G1))
print(nx.degree_pearson_correlation_coefficient(G))

# output
"""
nan
/usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py:4781: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(stats.ConstantInputWarning(msg))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-14-73f922e088fa>](https://localhost:8080/#) in <cell line: 5>()
      3 G1 = nx.Graph(G)
      4 print(nx.degree_pearson_correlation_coefficient(G1))
----> 5 print(nx.degree_pearson_correlation_coefficient(G))
      6 xy = nx.node_degree_xy(G, x="out", y="in")
      7 print(xy)

3 frames
[/usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py](https://localhost:8080/#) in pearsonr(x, y, alternative, method)
   4766 
   4767     if n < 2:
-> 4768         raise ValueError('x and y must have length at least 2.')
   4769 
   4770     x = np.asarray(x)

ValueError: x and y must have length at least 2.
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

ValueError crash in "nx.degree_pearson_correlation_coefficient" when input graph with no edges
3 participants