Minimum Consistent Subset Problem
- The Minimum Consistent Subset (MCS) is defined as the smallest set of vertices ensuring every vertex has a nearest neighbor of the same color within the set.
- It plays a crucial role in computational geometry and graph theory, linking clustering, label propagation, and structural covering problems.
- Recent research has developed approximation and fixed-parameter tractable algorithms, clarifying complexity and tractability frontiers in diverse graph classes.
A minimum consistent subset (MCS) of a colored metric space, or more generally a vertex-colored graph, is a smallest subset of points (vertices) satisfying the property that every point (vertex) in the ground set has a closest representative of its own color in the subset. The MCS problem seeks to compute such a subset of minimum cardinality. It is a foundational problem in computational geometry, graph theory, clustering, and related areas, linking instance selection, label propagation, and structural covering. Recent research has resolved key complexity aspects of MCS across geometric and graph classes, and developed both approximation and parameterized algorithms, clarifying the frontiers of tractability and inapproximability.
1. Formal Definition and Model Variations
Let be a connected undirected graph with a coloring function , where is the number of colors. For any , define the graph distance and the nearest neighbors . A node covers if and . The set collects all covering nearest neighbors of in . A subset is consistent if every satisfies .
The Minimum Consistent Subset (MCS) Problem is to find a consistent of minimum cardinality.
Variants arise in:
- Metric spaces: Closest-point distances (e.g., Euclidean) replace graph distance (Biniaz et al., 2018).
- Additional constraints: e.g., coverage within certain radii, or robustness to noisy labels.
2. Structural and Algorithmic Properties
2.1 Basic Observations and Coverage
Any consistent subset must select at least one representative of each color; thus, the minimum MCS cardinality is at least . The consistency property links to label propagation: every point must have a nearest sample of its own label among the selected representatives.
2.2 Geometric Interpretation
In the plane, MCS reduces to the condition that the Voronoi diagram of partitions the space so every cell contains points of only the same color as its site. The problem generalizes to higher-dimensional or discrete metric spaces (Biniaz et al., 2018).
2.3 Connections and Distinctions
MCS is distinct from dominating set and set cover: a consistent subset does not necessarily dominate all points but rather ensures correct color nearest neighbors, which can involve subtle global effects due to metric or path-dependent distances.
3. Computational Complexity Landscape
3.1 General Graphs and Planar Instances
MCS is NP-complete for general graphs and remains so on planar graphs, even with two colors (Manna, 23 May 2024, Banik et al., 23 Apr 2024, Manna et al., 2023, Biniaz et al., 2018). These results follow reductions from variants of Planar 3-SAT and related NP-complete problems. The complexity remains high even for geometric cases (minimum consistent subset of colored points in the plane).
3.2 Hardness on Restricted Graph Classes
Trees
For trees, MCS is NP-complete when the number of colors is part of the input (Banik et al., 23 Apr 2024, Manna et al., 2023). Complexity arises via gadgets encoding Boolean variables and clause satisfaction in tree structures, as shown by reductions from MAX-2SAT.
Interval and Circle Graphs
MCS is NP-complete on interval graphs, via reduction from vertex cover on cubic graphs, and APX-hard on circle graphs via gap-preserving reductions from the dominating set problem on the same graph class (Manna, 23 May 2024, Banik et al., 23 Apr 2024). Specifically, circle graphs exhibit hardness of approximation: there is no PTAS unless P=NP.
3.3 Tractable and FPT Cases
The problem admits polynomial-time algorithms on colored trees for fixed number of colors (Arimura et al., 2023) and is fixed-parameter tractable (FPT) on trees with time parameterized by the number of colors (Banik et al., 23 Apr 2024). For -chromatic spider trees (trees with a central vertex and legs), there are explicit polynomial-time procedures for constant (Manna et al., 2023).
For graphs of bounded vertex cover () or neighborhood diversity (), MCS is FPT parameterized by or , with running times and respectively, regardless of the number of colors (Banik et al., 14 Dec 2025).
Table: Overview of Complexity by Graph Class
| Graph Class | Complexity | Remarks |
|---|---|---|
| General/Planar | NP-complete | Even for |
| Trees (input ) | NP-complete | (Banik et al., 23 Apr 2024, Manna et al., 2023) |
| Trees (fixed ) | Poly. time | (Arimura et al., 2023), improved FPT (Banik et al., 23 Apr 2024) |
| Interval graphs | NP-complete | (Banik et al., 23 Apr 2024), (4+2)-approx (Manna, 23 May 2024) |
| Circle graphs | APX-hard | No PTAS unless P=NP (Manna, 23 May 2024) |
| Bounded / | FPT | Algorithms run in (Banik et al., 14 Dec 2025) |
4. Approximation Algorithms
For interval graphs, a -approximation algorithm computes a consistent subset within a factor of optimal, where is the number of colors (Manna, 23 May 2024). The algorithm operates by decomposing the interval graph into leaf bar covers (interval partitions) and constructing small color covers for each part, using dynamic programming to compute optimal covers and aggregating at most representatives per bar.
The algorithm proceeds as follows:
- Use DP to find a leaf bar cover minimizing partitions between consecutive sample points.
- For each bar, select at most intervals to cover all necessary colors.
- Unify the solutions to obtain the total cover.
For geometric instances (colored points in the plane), the best known algorithm is subexponential time in optimal size , running in (Biniaz et al., 2018).
No constant-factor approximation is currently known for circle graphs, and MCS is APX-hard in this setting (Manna, 23 May 2024).
5. Parameterized and Specialized Algorithms
5.1 Trees and Fixed-Parameter Tractability
For trees, the most advanced FPT algorithm runs in , exploiting a detailed DP over tree decompositions with states summarizing distance and color information from subtrees (Banik et al., 23 Apr 2024). When (number of colors) is fixed, a polynomial-time dynamic programming solution is available (Arimura et al., 2023). For trees with fixed topologies (e.g., spiders), algorithms exploit color-run structure for efficiency (Manna et al., 2023).
5.2 Bounded Structural Parameter Graphs
The problem is fixed-parameter tractable parameterized by vertex cover number , building on the observation that bounded limits pairwise distances and enables guessing distance profiles and minimal hitting sets for color coverage (Banik et al., 14 Dec 2025). For neighborhood diversity , the solution leverages partitioning into types and color-coding-style arguments to independently solve subproblems per label.
5.3 Geometric and Special-Case Algorithms
Linear-time algorithms exist for collinear points and certain special geometric layouts (e.g., points on two lines), with running times or depending on configuration (Biniaz et al., 2018).
6. Hardness of Approximation and Lower Bounds
MCS is APX-hard in circle graphs (Manna, 23 May 2024). No constant-factor approximation is known in the general planar or geometric setting. In the systems-of-equations setting, the problem of minimizing the number of unsatisfied equations is UGC-hard to approximate within any constant (Dabrowski et al., 2022). For colored metric spaces and graphs, inapproximability beyond the geometric or graph cases remains open.
7. Open Problems and Research Directions
Key open problems include:
- Improving the approximation ratio for interval graphs, potentially to or developing a PTAS for fixed color count (Manna, 23 May 2024).
- Obtaining constant-factor approximations for circle graphs or establishing stronger inapproximability results (Manna, 23 May 2024).
- Closing the gap on MCS complexity in trees with fixed colors or for broader classes such as bounded treewidth (Arimura et al., 2023, Manna et al., 2023).
- Fixed-parameter tractability in geometric metrics remains unresolved; current solutions are subexponential in optimal size but not FPT (Biniaz et al., 2018).
- Exploring dynamic or streaming versions of MCS, and potential applications in learning-theoretic settings such as supervised clustering (Banik et al., 14 Dec 2025).
A plausible implication is that further advances may arise either from refined structural decompositions of instance graphs, more powerful color-aggregation strategies in dynamic programming, or new hardness proofs leveraging geometric or structural obstructions.
References
- "Minimum Consistent Subset in Interval Graphs and Circle Graphs" (Manna, 23 May 2024)
- "Minimum Consistent Subset in Trees and Interval Graphs" (Banik et al., 23 Apr 2024)
- "Some results on Minimum Consistent Subsets of Trees" (Manna et al., 2023)
- "Minimum Consistent Subset for Trees Revisited" (Arimura et al., 2023)
- "Learning with Structure: Computing Consistent Subsets on Structurally-Regular Graphs" (Banik et al., 14 Dec 2025)
- "On the Minimum Consistent Subset Problem" (Biniaz et al., 2018)
- "Almost Consistent Systems of Linear Equations" (Dabrowski et al., 2022)