Parallel (sort of) connected component labeling #547
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Labels connected components in parallel, computing the input "mask" on the fly.
The goal is to make labeling connected components less of a bottleneck in PICO and elsewhere in PISM. The old implementation is purely serial and so does not scale. This version uses the same serial algorithm on each sub-domain and then combine results to get labels for the whole grid.
Here's the idea:
Identify connected components in each sub-domain, putting intermediate results in
output
."Update ghosts" of
output
, then iterate over sub-domain edges to identify connections between patches in sub-domains that make up connected components spanning multiple sub-domains. This defines a graph: each patch on a sub-domain is a node, two nodes are connected by an edge if and only if they "touch".Gather graph description on all sub-domains that have at least one patch.
Use breadth-first search to traverse this graph and compute final labels.
Apply final labels.
This method communicates ghosts (once), number of graph edges per sub-domain (once) and then all graph edges (once, only to sub-domains that have at least one patch).
Graph traversal is done "redundantly", i.e. each participating sub-domain1 traverses the whole graph even if it contains one isolated node. This is needed to ensure that resulting labels use consecutive numbers. (Consecutive labels are useful for indexing elsewhere in the code).
We could gather graph information on one MPI processor, traverse the graph to compute final labels, then scatter final labels. It is not clear if this would be better, though.
The current implementation uses
MPI_Allgather()
to gather number of graph edges per subdomain (send 4 bytes, receive 4 bytes per subdomain).MPI_Allgatherv()
to gather edges to all participating subdomains (send 8 bytes per local edge, receive 8-16 bytes per edge).An alternative implementation could use
MPI_Gather()
to gather number of graph edges per sub-domain to one of sub-domains. (Each sub-domain sends 4 bytes, one sub-domain receives 4 bytes per sub-domain).MPI_Gatherv()
to gather edges from all participating sub-domains. (All sub-domains send 8 bytes per local edge, one sub-domain receives 8 bytes per edge in the whole graph.)MPI_Bcast()
to scatter the mapping from old labels to new labels (8 bytes per local sub-domain).It is not clear which way is better. We need to run benchmarks!
This code works as it should, but
We need to run a few simulations to compare this to the old implementation.
Checklist
CHANGES.rst
Footnotes
This implementation uses
MPI_Comm_split()
to create a sub-communicator containing "participating sub-domains", i.e. sub-domains that contain at least one foreground pixel. It is not clear if this is a good idea. We could traverse the graph on all sub-domains (including empty ones) instead. This may or may not be a good idea depending on the cost ofMPI_Comm_split()
andMPI_Allgatherv()
calls mentioned above. ↩