Papers
Topics
Authors
Recent
2000 character limit reached

Minimum Consistent Subset Problem

Updated 21 December 2025
  • The Minimum Consistent Subset (MCS) is defined as the smallest set of vertices ensuring every vertex has a nearest neighbor of the same color within the set.
  • It plays a crucial role in computational geometry and graph theory, linking clustering, label propagation, and structural covering problems.
  • Recent research has developed approximation and fixed-parameter tractable algorithms, clarifying complexity and tractability frontiers in diverse graph classes.

A minimum consistent subset (MCS) of a colored metric space, or more generally a vertex-colored graph, is a smallest subset of points (vertices) satisfying the property that every point (vertex) in the ground set has a closest representative of its own color in the subset. The MCS problem seeks to compute such a subset of minimum cardinality. It is a foundational problem in computational geometry, graph theory, clustering, and related areas, linking instance selection, label propagation, and structural covering. Recent research has resolved key complexity aspects of MCS across geometric and graph classes, and developed both approximation and parameterized algorithms, clarifying the frontiers of tractability and inapproximability.

1. Formal Definition and Model Variations

Let G=(V,E)G=(V,E) be a connected undirected graph with a coloring function c:V{1,2,,α}c : V \rightarrow \{1,2,\dots,\alpha\}, where α\alpha is the number of colors. For any SVS\subseteq V, define the graph distance d(v,S)=minuSd(v,u)d(v,S)=\min_{u\in S} d(v,u) and the nearest neighbors NN(v,S)={uS:d(u,v)=d(v,S)}NN(v,S)=\{u\in S : d(u,v)=d(v,S)\}. A node uSu\in S covers vv if uNN(v,S)u\in NN(v,S) and c(u)=c(v)c(u)=c(v). The set COV(v,S)={uNN(v,S):c(u)=c(v)}COV(v,S)=\{u\in NN(v,S) : c(u)=c(v)\} collects all covering nearest neighbors of vv in SS. A subset SS is consistent if every vSv\notin S satisfies COV(v,S)COV(v,S)\neq\emptyset.

The Minimum Consistent Subset (MCS) Problem is to find a consistent SVS\subseteq V of minimum cardinality.

Variants arise in:

  • Metric spaces: Closest-point distances (e.g., Euclidean) replace graph distance (Biniaz et al., 2018).
  • Additional constraints: e.g., coverage within certain radii, or robustness to noisy labels.

2. Structural and Algorithmic Properties

2.1 Basic Observations and Coverage

Any consistent subset must select at least one representative of each color; thus, the minimum MCS cardinality is at least α\alpha. The consistency property links to label propagation: every point must have a nearest sample of its own label among the selected representatives.

2.2 Geometric Interpretation

In the plane, MCS reduces to the condition that the Voronoi diagram of SS partitions the space so every cell contains points of only the same color as its site. The problem generalizes to higher-dimensional or discrete metric spaces (Biniaz et al., 2018).

2.3 Connections and Distinctions

MCS is distinct from dominating set and set cover: a consistent subset does not necessarily dominate all points but rather ensures correct color nearest neighbors, which can involve subtle global effects due to metric or path-dependent distances.

3. Computational Complexity Landscape

3.1 General Graphs and Planar Instances

MCS is NP-complete for general graphs and remains so on planar graphs, even with two colors (Manna, 23 May 2024, Banik et al., 23 Apr 2024, Manna et al., 2023, Biniaz et al., 2018). These results follow reductions from variants of Planar 3-SAT and related NP-complete problems. The complexity remains high even for geometric cases (minimum consistent subset of colored points in the plane).

3.2 Hardness on Restricted Graph Classes

Trees

For trees, MCS is NP-complete when the number of colors is part of the input (Banik et al., 23 Apr 2024, Manna et al., 2023). Complexity arises via gadgets encoding Boolean variables and clause satisfaction in tree structures, as shown by reductions from MAX-2SAT.

Interval and Circle Graphs

MCS is NP-complete on interval graphs, via reduction from vertex cover on cubic graphs, and APX-hard on circle graphs via gap-preserving reductions from the dominating set problem on the same graph class (Manna, 23 May 2024, Banik et al., 23 Apr 2024). Specifically, circle graphs exhibit hardness of approximation: there is no PTAS unless P=NP.

3.3 Tractable and FPT Cases

The problem admits polynomial-time algorithms on colored trees for fixed number of colors kk (Arimura et al., 2023) and is fixed-parameter tractable (FPT) on trees with time O(26cn6)O(2^{6c} n^6) parameterized by the number of colors cc (Banik et al., 23 Apr 2024). For kk-chromatic spider trees (trees with a central vertex and k1k_1 legs), there are explicit polynomial-time procedures for constant kk (Manna et al., 2023).

For graphs of bounded vertex cover (vcvc) or neighborhood diversity (ndnd), MCS is FPT parameterized by vcvc or ndnd, with running times vcO(vc)poly(n,c)vc^{O(vc)} \cdot poly(n,c) and ndO(nd)poly(n,c)nd^{O(nd)} \cdot poly(n,c) respectively, regardless of the number of colors (Banik et al., 14 Dec 2025).

Table: Overview of Complexity by Graph Class

Graph Class Complexity Remarks
General/Planar NP-complete Even for α=2\alpha=2
Trees (input cc) NP-complete (Banik et al., 23 Apr 2024, Manna et al., 2023)
Trees (fixed kk) Poly. time O(24kn2k+3)O(2^{4k} n^{2k+3}) (Arimura et al., 2023), improved FPT (Banik et al., 23 Apr 2024)
Interval graphs NP-complete (Banik et al., 23 Apr 2024), (4α\alpha+2)-approx (Manna, 23 May 2024)
Circle graphs APX-hard No PTAS unless P=NP (Manna, 23 May 2024)
Bounded vcvc/ndnd FPT Algorithms run in vcO(vc)poly(n,c)vc^{O(vc)}poly(n,c) (Banik et al., 14 Dec 2025)

4. Approximation Algorithms

For interval graphs, a (4α+2)(4\alpha+2)-approximation algorithm computes a consistent subset within a factor (4α+2)(4\alpha+2) of optimal, where α\alpha is the number of colors (Manna, 23 May 2024). The algorithm operates by decomposing the interval graph into leaf bar covers (interval partitions) and constructing small color covers for each part, using dynamic programming to compute optimal covers and aggregating at most 2α2\alpha representatives per bar.

The algorithm proceeds as follows:

  1. Use DP to find a leaf bar cover minimizing partitions between consecutive sample points.
  2. For each bar, select at most 2α2\alpha intervals to cover all necessary colors.
  3. Unify the solutions to obtain the total cover.

For geometric instances (colored points in the plane), the best known algorithm is subexponential time in optimal size kk, running in nO(k)n^{O(\sqrt{k})} (Biniaz et al., 2018).

No constant-factor approximation is currently known for circle graphs, and MCS is APX-hard in this setting (Manna, 23 May 2024).

5. Parameterized and Specialized Algorithms

5.1 Trees and Fixed-Parameter Tractability

For trees, the most advanced FPT algorithm runs in O(26cn6)O(2^{6c} n^6), exploiting a detailed DP over tree decompositions with states summarizing distance and color information from subtrees (Banik et al., 23 Apr 2024). When kk (number of colors) is fixed, a polynomial-time O(24kn2k+3)O(2^{4k} n^{2k+3}) dynamic programming solution is available (Arimura et al., 2023). For trees with fixed topologies (e.g., spiders), algorithms exploit color-run structure for efficiency (Manna et al., 2023).

5.2 Bounded Structural Parameter Graphs

The problem is fixed-parameter tractable parameterized by vertex cover number vcvc, building on the observation that bounded vcvc limits pairwise distances and enables guessing distance profiles and minimal hitting sets for color coverage (Banik et al., 14 Dec 2025). For neighborhood diversity ndnd, the solution leverages partitioning into types and color-coding-style arguments to independently solve subproblems per label.

5.3 Geometric and Special-Case Algorithms

Linear-time algorithms exist for collinear points and certain special geometric layouts (e.g., points on two lines), with running times O(n)O(n) or O(n6)O(n^6) depending on configuration (Biniaz et al., 2018).

6. Hardness of Approximation and Lower Bounds

MCS is APX-hard in circle graphs (Manna, 23 May 2024). No constant-factor approximation is known in the general planar or geometric setting. In the systems-of-equations setting, the problem of minimizing the number of unsatisfied equations is UGC-hard to approximate within any constant (Dabrowski et al., 2022). For colored metric spaces and graphs, inapproximability beyond the geometric or graph cases remains open.

7. Open Problems and Research Directions

Key open problems include:

  • Improving the approximation ratio for interval graphs, potentially to O(α)O(\alpha) or developing a PTAS for fixed color count (Manna, 23 May 2024).
  • Obtaining constant-factor approximations for circle graphs or establishing stronger inapproximability results (Manna, 23 May 2024).
  • Closing the gap on MCS complexity in trees with fixed k3k\ge3 colors or for broader classes such as bounded treewidth (Arimura et al., 2023, Manna et al., 2023).
  • Fixed-parameter tractability in geometric metrics remains unresolved; current solutions are subexponential in optimal size kk but not FPT (Biniaz et al., 2018).
  • Exploring dynamic or streaming versions of MCS, and potential applications in learning-theoretic settings such as supervised clustering (Banik et al., 14 Dec 2025).

A plausible implication is that further advances may arise either from refined structural decompositions of instance graphs, more powerful color-aggregation strategies in dynamic programming, or new hardness proofs leveraging geometric or structural obstructions.

References

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Minimum Consistent Subset (MCS) Problem.