Connected Components Labeling
- Connected Components Labeling is an algorithm that partitions domain elements (e.g., pixels, vertices) into maximal connected subsets based on specified adjacency relations.
- It employs various methods—such as union–find, label propagation, and GPU-accelerated kernels—to efficiently identify and merge connected regions.
- Practical implementations yield significant speedups and enhanced accuracy in applications like image segmentation, graph analytics, and computational geometry.
Connected components labeling (CCL) is the algorithmic process of partitioning the elements of a domain—such as the vertices of a graph, the pixels of an image, or the cells of a triangulation—into maximal subsets in which each pair of elements is connected according to a specified adjacency relation. The CCL task arises across graph analytics, statistical physics, image analysis, computational geometry, and finite element modeling. Computational strategies for CCL span a spectrum from classical union–find schemes and label-propagation to specialized GPU kernels and recursive subdivision for implicitly defined domains.
1. Formal Definition and Scope
Given an undirected graph , CCL seeks a labeling function such that for ,
The result is a partition of into disjoint subsets, each corresponding to one component. Let , , and the maximal component diameter.
This concept extends to domains such as digital images (binary or grayscale), where the adjacency structure is induced by pixel neighborhoods, and to geometric domains defined by algebraic constraints (e.g., for some polynomial ), where “connectivity” becomes pathwise topological connectivity in the domain subset.
2. Algorithmic Foundations and Parallel Methods
Contemporary CCL implementations are grounded in several algorithmic paradigms:
- Union–Find (Disjoint-Set Forests): The classical two-pass schema incrementally merges provisional labels using union-by-minimum with path compression, often optimized with local pre-labeling, block-wise processing, and limited atomic operations (Chen et al., 2017, Komura, 2016, Chen et al., 2017). On GPUs, union–find enables high degrees of parallelism but can become a bottleneck if contention arises on root labels, especially for irregular or random graph topologies (Komura, 2016, Weigel, 2011).
- Minimum-Mapping/Label Propagation: These approaches maintain tentative labels for each vertex or pixel and iteratively propagate values across adjacent pairs, typically via minimum assignment. The Contour algorithm (Du et al., 2023) is exemplary; each edge applies a minimum-mapping operator per iteration, and high-order operators () accelerate tree-flattening by chasing pointers up to levels (i.e., ), achieving convergence in iterations with work per round (Du et al., 2023). Simpler label-propagation methods in PRAM, stream, and MapReduce models similarly assign each label the minimal value reachable via neighbors (Burkhardt, 2018).
- Run-based and Region-Growing CCL: For images with long homogeneous runs, run-length encoding combined with transitive closure unification reduces working set size and enables efficient one-pass algorithms, with the additional benefit of on-the-fly computation of features (e.g., area, bounding boxes, moments) and topological invariants (Euler number) (Lemaitre et al., 2020).
- Blockwise Coarse-to-Fine Approaches: Parallel strategies for 2D images partition the input into tiles or blocks, assign provisional labels using shared memory, and resolve cross-block equivalence via targeted boundary analysis and merging. This drastically reduces both atomic operations and memory traffic while retaining scalability (Chen et al., 2017, Chen et al., 2017, Weigel, 2011).
- Recursive Subdivision for Implicit Domains: For sets defined by polynomial constraints, recursive partitioning into hyperrectangles (e.g., via quadtree/octree) is combined with Bernstein-basis topology tests to determine simple-connectedness within subcells. Adjacency graphs built over leaf cells are analyzed by union–find, yielding pathwise correct labeling, subject to resolution limits controlled by subdivision depth (Saye, 2022).
3. Analysis of Algorithmic Complexity and Convergence
CCL algorithmic performance and theoretical bounds are shaped by factors such as graph diameter, degree, image dimensions, and GPU/CPU architecture:
- Contour Algorithm Complexity: Per iteration, the edge-parallel minimum-mapping ensures work; the key result is that an order-2 mapping compresses any path in steps. Thus, in general, the total iteration count is
leading to overall work and highly predictable convergence, with monotonic label decreases and provable correctness (Du et al., 2023).
- GPU Union–Find and Region Strategies: Amortized time per component is reduced to near-linear by coarse labeling that pre-compresses equivalence chains within blocks, limiting global atomic operations to block boundaries and final link phases. Space complexity is mainly governed by label and neighbor data, producing total work of for images of size ( = inverse Ackermann) (Chen et al., 2017, Chen et al., 2017, Komura, 2016).
- Label Propagation in PRAM/Distributed Models: Each round processes edges; experimental evidence and proven results on path graphs indicate convergence, although a full proof for arbitrary graphs is open (Burkhardt, 2018). Deterministic concurrent algorithms can match this in the combining CRCW PRAM or distributed MPC model—RA and R algorithms exhibit work and step bounds (Liu et al., 2018).
- Recursive Subdivision Complexity: For a maximum tree depth , the number of leaf cells is , with union–find performed over these cells (for both positive and negative sign regions). In nonsingular or low-dimensional singularity cases, the actual number of leaves is reduced (Saye, 2022).
4. Practical Implementations, Parallelization, and Architectural Adaptation
CCL algorithms are extensively optimized for architectures ranging from multicore CPU to NVIDIA GPUs:
- Arachne/Arkouda Framework: The Contour algorithm is implemented within Arachne, providing a parallel Python API atop Chapel, with support for early convergence detection, mapping-order tuning, and elimination of atomic compare-and-swap loops where possible (Du et al., 2023).
- GPU-Guided Designs: Techniques leverage CUDA thread blocks, shared memory for block-local merges, and global boundary-only union–find passes to accelerate processing and reduce memory transactions (Chen et al., 2017, Chen et al., 2017, Weigel, 2011). Adaptive block sizes balance occupancy and memory constraints.
- Run-Based 1-Pass Labeling: On CPUs, run-based methods combine RLE, disjoint-set maintenance, and adjacency trees to efficiently process holes (background inclusion) and component features without full-pixel scans (Lemaitre et al., 2020).
- Geometric and Domain-Specific Methods: For implicitly defined domains, recursive polynomial evaluations and Bernstein range analysis enable robust identification even in the presence of singularities or near-degenerate components, with tunable resolution control (Saye, 2022).
- Image Segmentation Beyond Binary: Extensions of CCL to non-binary/grayscale domains apply iterative thresholding, followed by per-threshold binary labeling with adaptive growing/merging policies based on overlap and containment, crucial for applications such as disparity segmentation in automotive ADAS systems (Mukha et al., 2018).
5. Experimental Results, Application Domains, and Limitations
The empirical performance and domain-specific adaptation of CCL algorithms are reported across a variety of hardware and workloads:
- Performance Benchmarks: On a 20-core shared memory system, Contour achieves 7.3× speedup over FastSV and 1.4× over ConnectIt, with linear scaling up to 20 cores and rapid convergence (2–5 iterations) even for large-diameter graphs (Du et al., 2023). GPU-tailored algorithms reach speedups of 20–50× over tuned CPU code for lattices with (Komura, 2016, Weigel, 2011).
- Image Analysis and Feature Extraction: For images up to , optimized union–find and coarse-to-fine block approaches cut atomic operations and bandwidth by orders of magnitude, with speedups up to 3–4× over prior GPU techniques (Chen et al., 2017, Chen et al., 2017). Run-based CCL with on-the-fly feature computation produces cycles-per-pixel as low as 2.8–15.3 (with hole filling and complex feature tracking), outperforming traditional two-pass pixel scans (Lemaitre et al., 2020).
- Implicit Domains and High-Complexity Geometry: Recursive cell subdivision identifies components up to specified resolution, with theoretical guarantees on correctness absent degeneracy, and practical merging (“gluing”) only in cases of near-touching boundaries or imposed subdivision limits (Saye, 2022).
- Limitations: Practical constraints include shared-memory size and block occupancy (on GPU), atomic operation contention (for random or irregular graphs), and inherent memory consumption in 3D cubulation-based approaches (e.g., TriCCo for triangular grids, scaling as ) (Voigt et al., 2021). For implicitly defined domains, label ambiguity arises when the cell size is coarse compared to the separation of components.
6. Algorithmic Variants, Tradeoffs, and Best Practices
Algorithmic selection and tuning are highly dependent on instance characteristics:
| Algorithm/Variant | Key Features | Iteration/Work Bounds |
|---|---|---|
| Contour (order-2) | Minimum-mapping, PRAM | iterations, /iter (Du et al., 2023) |
| Union–Find (optimized) | Block merge, path compression | (Chen et al., 2017) |
| Label Propagation | CRCW PRAM, MapReduce | work; steps (empirical) (Burkhardt, 2018) |
| Classic Two-Pass (CPU) | Pixel scan, union–find | ; high memory traffic (Lemaitre et al., 2020) |
| Run-Based (RLE) | On-the-fly features, holes | , where (Lemaitre et al., 2020) |
| Recursive Subdivision | Implicit sets, Bernstein topology | leaves, user-tunable (Saye, 2022) |
Contour recommends higher mapping orders or hybrid strategies for graphs with very large or skewed diameter, asynchronous updates and early-exit conditions for general cases, and lowers atomic write overhead whenever possible (Du et al., 2023). Coarse pre-labeling and blockwise processing remain universally beneficial for high-parallelism environments (Chen et al., 2017, Chen et al., 2017). Edge cases, especially in algebraic/implicit sets, are best handled by conservative gluing and explicit tuning of resolution, ensuring that the topological labeling never artificially splits genuine components (Saye, 2022).
7. Applications Across Domains
CCL is foundational in domains including:
- Graph analytics and large-scale data processing: Interactive graph analytic frameworks such as Arachne/Arkouda (Du et al., 2023).
- Image segmentation and analysis: CCL forms the basis of region extraction, object tracking, and morphology in both 2D and 3D imaging (Chen et al., 2017, Mukha et al., 2018).
- Statistical physics and percolation modeling: Cluster identification is critical in spin model simulations, percolation studies, and the analysis of stochastic geometric structures (Weigel, 2011, Komura, 2016).
- Computational geometry and mesh processing: Recursive subdivision/Cubulation approaches support robust topological analysis of polynomial or triangulation-defined domains (Saye, 2022, Voigt et al., 2021).
The combinatorial and algorithmic diversity of the CCL family, coupled with extensive parallelization opportunities, positions it as a core subroutine with continuing research activity around architectural adaptation, complexity scaling, and robustness under challenging domain definitions.
References:
- "Contour Algorithm for Connectivity" (Du et al., 2023)
- "A generalized GPU-based connected component labeling algorithm" (Komura, 2016)
- "An Optimized Union-Find Algorithm for Connected Components Labeling Using GPUs" (Chen et al., 2017)
- "Graph connectivity in log steps using label propagation" (Burkhardt, 2018)
- "Simple Concurrent Labeling Algorithms for Connected Components" (Liu et al., 2018)
- "Disparity Image Segmentation For ADAS" (Mukha et al., 2018)
- "A Connected Component Labeling Algorithm for Implicitly-Defined Domains" (Saye, 2022)
- "Efficient Parallel Connected Components Labeling with a Coarse-to-fine Strategy" (Chen et al., 2017)
- "A New Run-based Connected Component Labeling for Efficiently Analyzing and Processing Holes" (Lemaitre et al., 2020)
- "TriCCo -- a cubulation-based method for computing connected components on triangular grids" (Voigt et al., 2021)
- "Connected component identification and cluster update on GPU" (Weigel, 2011)