Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute_v_structures output and docstring do not agree #7385

Open
maichmueller opened this issue Apr 1, 2024 · 3 comments · May be fixed by #7398
Open

compute_v_structures output and docstring do not agree #7385

maichmueller opened this issue Apr 1, 2024 · 3 comments · May be fixed by #7398

Comments

@maichmueller
Copy link

Current Behavior

The docstring of the method compute_v_structures in dag.py says

Iterate through the graph to compute all v-structures.

V-structures are triples in the directed graph where
two parent nodes point to the same child and the two parent nodes
are not adjacent.

...

In particular, the part and the two parent nodes are not adjacent is what seems to disagree with the output of the iterator for a DAG. To my knowledge, in a DAG G=(V, E) two nodes a,b are adjacent if (a,b) in E || (b,a) in E:

In [6]: import networkx as nx

In [7]: G = nx.DiGraph()
   ...: G.add_edges_from([(2, 4), (5, 4), (2, 5)])
   ...: list(nx.compute_v_structures(G))
Out[7]: [(2, 4, 5)]

In [8]: list(G.adjacency())
Out[8]: [(2, {4: {}, 5: {}}), (4, {}), (5, {4: {}})]

Clearly, 2 and 5 are adjacent, yet the function still returns (2,4,5) as a v-structure.

Expected Behavior

The test case above should return an empty list or the docstring should change its specification of v-structures.

Steps to Reproduce

Install networkx >= 3.1 and run the above commands.

Environment

Python version: 3.8 - 3.11
NetworkX version: 3.1 - 3.21

@dschult
Copy link
Member

dschult commented Apr 2, 2024

It looks to me like the doc_string does not match the function behavior. The function seems to compute triples where the two outer nodes have directed edges to the middle node. It doesn't perform checks on connections between the outer(parent) nodes.

Unfortunately the tests do not include this case where the two parents are connected. And the wikipedia reference doesn't actually use the term v-structures as far as I can tell. Looking at the code, no relationshpi between the parents is explored. It is only finding triples with edges directed at the middle node. And it actually won't work if the nodes are not sortable -- which we should have caught upon review. This function is not used elsewhere in NetworkX including in the causal DAG parts of the code.

@maichmueller can you describe how you are using this function? Should it be removed?

@adam2392 do you have any thoughts about the compute_v_structures whether it should examine connections between the parents and whether this function is useful now that we have the d_separation code (that doesn't use v-structures)?

Thanks!

@maichmueller
Copy link
Author

hi @dschult, compute_v_structures currently returns colliders, not v-structures. Renaming it to compute_colliders and adapting a little the docstring would already be sufficient as far as I can tell (could do a PR if wanted).

Otherwise, the way to fix it for computing v-structures would be a simple check on either parent node sharing an edge with the other parent. This is in fact how I am currently using it in my lib:

    def collider_iter(
        self, 
        only_v_structure: bool = False
    ) -> Generator[ColliderView, None, None]:
        """
        Provide a generator that yields all colliders of the underlying DAG.

        Colliders are triples of nodes (parent, child, other_parent)
        in which both parent and other_parent causally affect the child

        Returns
        -------
        Generator[ColliderView, None, None],
            the colliders/v-structures found
        """
        dag = self.dag
        for par1, child, par2 in nx.compute_v_structures(dag):
            adj_parents = dag.has_edge(par1, par2) or dag.has_edge(par2, par1)
            if adj_parents and only_v_structure:
                continue
            ordered_parents = self._order_parents_by_arg_index(
                self.get_variable_args(child), (par1, par2)
            )
            yield ColliderView(*ordered_parents, child, is_v_structure=not adj_parents)

So I am not doing much more than adding the check for adjacency between the parents to decide if it is a collider or a true v-structure. The definition of v-structure is not completely unambiguous from what I can tell, but for reference, I would mention Pearl's PRIMER, chapter 2 page 50:

[...]and if they share common v-structures, that is, colliders whose parents are not adjacent[...]

The sorting should not happen IMO, as it is not essential to the functionality, yet limits its scope.
I also don't think this function needs to be removed as finding colliders/v-structs may be of interest to others as well, but the documentation of the function should be made to align with its intention.

@adam2392
Copy link
Contributor

adam2392 commented Apr 2, 2024

hi @dschult, compute_v_structures currently returns colliders, not v-structures. Renaming it to compute_colliders and adapting a little the docstring would already be sufficient as far as I can tell (could do a PR if wanted).

I am +0.5 in favor of this solution if we don't want v-structures. See below

Otherwise, the way to fix it for computing v-structures would be a simple check on either parent node sharing an edge with the other parent. This is in fact how I am currently using it in my lib:

Another possibility is the following code:

    for node in G.nodes:
        for p1, p2 in combinations(G.predecessors(node), 2):
            if p1 not in G.predecessors(p2) and p2 not in G.predecessors(p1):
                  yield((p1, node, p2))

We simply need to add a unit-test for having adjacent parents to confirm that triplet is not a "v-structure", or unshielded collider.

So I am not doing much more than adding the check for adjacency between the parents to decide if it is a collider or a true v-structure. The definition of v-structure is not completely unambiguous from what I can tell, but for reference, I would mention Pearl's PRIMER, chapter 2 page 50:

Agreed.

The sorting should not happen IMO, as it is not essential to the functionality, yet limits its scope. I also don't think this function needs to be removed as finding colliders/v-structs may be of interest to others as well, but the documentation of the function should be made to align with its intention.

Yes I'm not exactly sure why there is sorting here. We can prolly remove it tbh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants