Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable and Incorrect "nx.eigenvector_centrality_numpy" result when the graph is disconnected #6888

Open
joyemang33 opened this issue Aug 31, 2023 · 4 comments

Comments

@joyemang33
Copy link

joyemang33 commented Aug 31, 2023

Hello! Sorry for bothering you again!
I found that when the graph is disconnected, the result of the nx.eigenvector_centrality_numpy is unstable and different from the nx.eigenvector_centrality. Here is the reduced test case with 4 nodes and 2 edges:

Step to Reproduce:

import networkx as nx

G = nx.from_edgelist([(1, 2), (3, 4)])
print(nx.eigenvector_centrality_numpy(G))
G = nx.from_edgelist([(1, 2), (3, 4)])
print(nx.eigenvector_centrality_numpy(G))
print(nx.eigenvector_centrality(G))

Results:

#nx.eigenvector_centrality_numpy (unstable!)
{1: -0.14904837055016898, 2: -0.14904837055016898, 3: 0.6912196345853752, 4: 0.6912196345853752}
{1: -0.18977204610658815, 2: -0.1897720461065881, 3: 0.6811655969854312, 4: 0.6811655969854313}
#nx.eigenvector_centrality
{1: 0.5, 2: 0.5, 3: 0.5, 4: 0.5}

I believe that it may be related to a logic bug in the implementation of nx.eigenvector_centrality_numpy. Would it be possible for you to further confirm and investigate it?

Best regards,
Joye

Environments

NetworkX: 3.1
Python: 3.10

@dschult
Copy link
Member

dschult commented Aug 31, 2023

One of the difficulties with using eigenvectors for measuring centrality is when the adjacency matrix has largest eigenvalue with a multiplicity greater than 1. Said another way, there is a plane (or even 3 or more D space) of eigenvectors with that eigenvalue. Normally the Perron-Frobenius Theorem ensures that there is only a line of eigenvectors for the largest eigenvalue (can't be less than a line because any multiple of an eigenvector is also an eigenvector). We normally select a unique eigenvector by choosing a unit vector with first nonzero element positive. But the P-F theorem doesn't hold when the the network is disconnected (in other language: when the matrix is reducible). Then there is an entire plane (or more) of eigenvectors for the largest eigenvalue and the computed eigenvectors reflect a correct eigenvector, but only one of many. Depending on the method used, round-off error can affect which of the infinitely many choices of eigenvector may be reported. They are all correct as eigenvectors, but can't easily be used for centrality measures.

The python version of eigenvector_centrality is fairly robust -- that is, it reports the same results each time. But the numpy version doesn't. That is the source of what you see. But the underlying difficulty is that the eigenvector centrality is not well defined for disconnected graphs.

@joyemang33
Copy link
Author

Hello @dschult, Thank you for your quick and valuable response.

I agree that the unexpected behavior of nx.eigenvector_centrality_numpy in the disconnected graph is caused by the lack of well-defined behavior for disconnected graphs.

However, the results are incorrect concerning the intended purpose of nx.eigenvector_centrality_numpy, which is to calculate centrality based on neighbor centrality.
In the given graph, all 4 nodes should ideally possess the same centrality value as returned by nx.eigenvector_centrality. The current behavior can perplex users who are analyzing social networks without an in-depth understanding of the underlying algorithm.

I've surveyed how other graph libraries tackle this challenge in disconnected graphs, and you can find the details here: link. I've identified two primary solutions to address this undefined behavior and enhance NetworkX's robustness:

  1. Exception Handling: Reject disconnected graphs and throw an exception to caution users about unexpected outcomes.
  2. Graph Decomposition: Decomposing the disconnected graph and individually computing centrality using nx.eigenvector_centrality_numpy for each component. However, this approach might introduce inconsistencies with mathematical definitions.

Considering the fairly robust of nx.eigenvector_centrality, a straightforward solution could involve raising an exception when dealing with disconnected graphs with nx.eigenvector_centrality_numpy. This would effectively resolve the issue.

Best regards,
Joye

@dschult
Copy link
Member

dschult commented Sep 1, 2023

Thank you for these suggestions!!
I like the approach of raising an exception for disconnected graphs.

But it does mean that we would need to check whether the graph is disconnected for every call of the function. I haven't looked at how much that slowdown would be. If it is significant, maybe we could put a warning in the doc_string that results may not be effective for disconnected graphs. We could test whether the timing is significant or not by timing a two step process of ns.is_connected(G); nx.eigenvector_centrality_numpy(G) for various graphs G and compare with just centrality.

@joyemang33
Copy link
Author

Thanks for considering suggestions and fixing it.

Intuitively, I think that checking whether the graph is connected will not significantly reduce the efficiency, since it only needs O(n) time complexity, compared the time cost of computing eigenvectors.

I also totally agree with you that we should test their real performance. For me, I will be more happy if you could finally fix the undefined behavior.

Thanks once again for your valuable comments.

Best regards,
Joye

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants