Constrained Subset Search (CSS)

Updated 4 July 2026

Constrained Subset Search (CSS) is a class of optimization problems that selects item subsets under explicit feasibility constraints to optimize a task-specific objective.
Its applications include subgroup discovery, matrix reconstruction, sparse regression, and graph consistency, offering versatile approaches for data selection and analysis.
Algorithmic approaches for CSS range from exact solvers and greedy heuristics to randomized sampling and calibrated screening, each addressing the inherent combinatorial challenges.

Constrained Subset Search (CSS) denotes a family of optimization problems in which one selects a subset of items subject to explicit feasibility constraints and optimizes a task-specific objective. In the cited literature, CSS appears both as a generic paradigm and as the name of specific methods. Its instances include sparse subgroup discovery over feature conditions, column subset selection for matrix reconstruction, partition-constrained set-function maximization, calibrated screening, sparse regression, graph-metric consistency problems, and hashing-based set similarity search (Bach, 2024, Shitov, 2017, Zhang et al., 23 Mar 2026, Wang et al., 2022, Pia et al., 2018, Manna, 2024, Ahle et al., 2019). This suggests that CSS is best understood not as a single canonical problem, but as a recurrent optimization pattern in which combinatorial subset choice is coupled to structural, statistical, or geometric constraints.

1. Problem family and formal scope

Across the literature, CSS takes the form of optimizing over subsets under a feasibility family. In the partition-constrained formulation, the ground set $V$ is partitioned into blocks $V=\bigcup_{k=1}^{K}V_k$ with capacities $B_k$ , and the feasible family is

$\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$

with objective

$\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$

In subgroup discovery, the selected subset is an axis-aligned or categorical rule over data rows, obtained by choosing which feature conditions to activate and which bounds to impose (Zhang et al., 23 Mar 2026, Bach, 2024).

In numerical linear algebra, CSS specializes to selecting exactly $k$ columns of a matrix $M$ so as to minimize Frobenius reconstruction error,

$\min_{S,A}\|M-SA\|_F,$

where $S$ is an $r\times k$ submatrix of $V=\bigcup_{k=1}^{K}V_k$ 0 and $V=\bigcup_{k=1}^{K}V_k$ 1 is unconstrained. In screening, the subset is a shortlist chosen from a scored pool, and the constraint is an expected qualified-count requirement rather than a deterministic combinatorial rule. In graph consistency problems, the subset is a set of vertices constrained by nearest-neighbor color conditions in the graph metric (Shitov, 2017, Wang et al., 2022, Manna, 2024).

Setting	Objective	Constraint family
Subgroup discovery	maximize $V=\bigcup_{k=1}^{K}V_k$ 2	sparsity, overlap, alternatives
Column subset selection	minimize $V=\bigcup_{k=1}^{K}V_k$ 3	exactly $V=\bigcup_{k=1}^{K}V_k$ 4 columns
Partition-constrained set selection	maximize $V=\bigcup_{k=1}^{K}V_k$ 5	$V=\bigcup_{k=1}^{K}V_k$ 6
Calibrated screening	minimize expected shortlist size	expected qualified count $V=\bigcup_{k=1}^{K}V_k$ 7
Sparse regression	minimize least-squares error	$V=\bigcup_{k=1}^{K}V_k$ 8
Consistent subsets in graphs	minimize $V=\bigcup_{k=1}^{K}V_k$ 9	nearest-neighbor color consistency

The unifying feature is that the subset itself is the principal decision variable. Thresholds, coefficients, bounds, or auxiliary variables are then optimized conditional on that subset, or are jointly optimized with it.

2. Core mathematical formulations

In subgroup discovery, the dataset is $B_k$ 0 with binary target $B_k$ 1. A subgroup description is a conjunction of per-feature conditions, and row membership is

$B_k$ 2

CSS enters through the need to choose a sparse subset of features and corresponding bounds while satisfying constraints such as $B_k$ 3, where $B_k$ 4 indicates whether feature $B_k$ 5 is selected (Bach, 2024).

The quality measure emphasized in that setting is Weighted Relative Accuracy (WRAcc). With $B_k$ 6, $B_k$ 7, and $B_k$ 8,

$B_k$ 9

Its normalized form,

$\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 0

is used for cross-dataset comparison. In this formulation, the objective is linear in the binary membership variables $\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 1, which is important for solver-based encodings (Bach, 2024).

In column subset selection, fixing the selected columns $\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 2 reduces the optimal coefficient matrix to the Moore–Penrose pseudoinverse solution $\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 3, yielding the projection form

$\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 4

where $\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 5 projects onto $\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 6. This turns CSS into the search for a cardinality-constrained subspace spanned by actual columns rather than arbitrary singular vectors (Shitov, 2017).

In sparse regression, the canonical formulation is

$\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 7

The sparse-matrix variant studied in block-structured designs generalizes this to

$\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 8

with tractability controlled by the block structure of $\mathcal{F}\triangleq \left\{S\subseteq V:\ |S\cap V_k|\le B_k,\ \forall k\in[K]\right\},$ 9 (Pia et al., 2018).

In calibrated screening, the subset is a shortlist $\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 0 derived from scores $\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 1. The optimization target is

$\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 2

so CSS becomes a constrained stochastic selection problem over pools of candidates rather than over matrix columns or feature rules (Wang et al., 2022).

3. Constraint types and structural variations

The most common CSS constraint is feature or item cardinality. In subgroup discovery this appears as feature sparsity, implemented by indicators $\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 3 and the budget $\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 4. Numeric features are handled through intervals $\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 5, with $\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 6 used to deselect a feature; categorical features are handled either by ordinal encoding or by one-hot encoding with per-category binary bounds (Bach, 2024).

A second class of constraints is structural rather than purely cardinal. The subgroup-discovery literature introduces alternative subgroup descriptions: given an original subgroup with feature-selection vector $\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 7 and membership $\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 8, one seeks a new subgroup maximizing similarity in covered instances while enforcing feature-level dissimilarity. The similarity objective is normalized Hamming similarity,

$\max_{S\subseteq V} f(S)\quad\text{s.t.}\quad S\in\mathcal{F}.$ 9

and the asymmetric deselection dissimilarity is

$k$ 0

The constraint

$k$ 1

forces the alternative to drop at least $k$ 2 previously used features, or all of them when $k$ 3 (Bach, 2024).

Partition constraints form another major CSS variant. Here feasibility is determined blockwise by capacities $k$ 4, producing a partition matroid. The Multinoulli Extension parameterizes each block $k$ 5 by a vector $k$ 6, where

$k$ 7

This parameterization is designed so that the continuous relaxation already respects the partition structure (Zhang et al., 23 Mar 2026).

Screening constraints are expectation-based and can be groupwise. Given group targets $k$ 8, CSS-D applies calibration independently per group and guarantees

$k$ 9

with simultaneous control across groups by Bonferroni splitting of the confidence level (Wang et al., 2022).

Graph-based CSS introduces metric-nearest-neighbor constraints. For a colored graph $M$ 0, a consistent subset $M$ 1 requires that every vertex has at least one nearest selected neighbor of the same color, while a strict consistent subset requires that all nearest selected neighbors have that color. The strict variant is therefore a stronger universal constraint than the existential condition used in ordinary consistency (Manna, 2024).

4. Complexity landscape

The complexity picture is predominantly hard. In subgroup discovery, exhaustive search over interval bounds has size $M$ 2 without sparsity and $M$ 3 under a $M$ 4-feature sparsity budget. The problem lies in XP with parameter $M$ 5, or with parameter $M$ 6 under sparsity, but several constrained variants are NP-complete. The cited results show NP-completeness for perfect-subgroup discovery with feature-cardinality constraints, for general subgroup discovery under quality measures such as WRAcc whose maximal value is attained only by perfect subgroups, and for alternative-subgroup formulations with dissimilarity thresholds that forbid feature reuse (Bach, 2024).

For column subset selection, the decision version asks whether there exists a $M$ 7-column submatrix $M$ 8 with

$M$ 9

This problem is NP-complete over rational matrices under the Frobenius norm. The reduction is from Graph 3-Coloring, and the hardness is established by constructing a matrix $\min_{S,A}\|M-SA\|_F,$ 0 whose near-optimal $\min_{S,A}\|M-SA\|_F,$ 1-column selections correspond exactly to valid 3-colorings of the source graph (Shitov, 2017).

Sparse linear-regression subset selection is NP-hard in general, but the sparse-matrix paper identifies polynomially solvable regimes. When the design matrix has the form $\min_{S,A}\|M-SA\|_F,$ 2, with $\min_{S,A}\|M-SA\|_F,$ 3 block diagonal, all block widths $\min_{S,A}\|M-SA\|_F,$ 4 fixed, and the number of global columns $\min_{S,A}\|M-SA\|_F,$ 5 fixed, exact optimization becomes polynomial-time solvable. The diagonal case requires $\min_{S,A}\|M-SA\|_F,$ 6 least-squares solves, and the block-diagonal case requires

$\min_{S,A}\|M-SA\|_F,$ 7

least-squares solves, with $\min_{S,A}\|M-SA\|_F,$ 8 and $\min_{S,A}\|M-SA\|_F,$ 9 (Pia et al., 2018).

In graph consistency, the Minimum Consistent Subset problem is NP-complete on general graphs and on planar graphs, while the Minimum Strict Consistent Subset problem is NP-hard via reduction from Dominating Set. At the same time, exact polynomial-time algorithms exist for trees, paths, spiders, and combs (Manna, 2024).

Problem family	Hardness result	Tractable regime
Subgroup discovery with sparsity/alternatives	NP-complete	heuristics and SMT on finite instances
Column subset selection	NP-complete	no general exact polytime algorithm stated
Sparse regression subset selection	NP-hard in general	fixed block widths and fixed $S$ 0
Graph consistency subsets	NP-complete / NP-hard	trees, paths, spiders, combs

A recurring implication is that CSS hardness typically comes from coupling subset choice with either threshold optimization, projection geometry, or inter-subset dissimilarity constraints.

5. Algorithmic approaches

The algorithmic literature divides into exact declarative methods, greedy or heuristic search, randomized subset samplers, and calibration-based procedures.

For subgroup discovery, an SMT formulation over Linear Real Arithmetic plus Booleans uses decision variables for membership $S$ 1, bounds $S$ 2, and selection indicators $S$ 3, with Z3 used as the solver and OptSMT/OMT used for optimization. Optional sparsity and alternative-description constraints are expressed directly, yielding a white-box optimizer. The same paper integrates the same constraints into heuristic methods such as PRIM, Beam Search, a Best Interval variant, MORS, and Random Search. On 27 binary-classification datasets, heuristic methods often produced high-quality subgroups with much shorter runtimes than SMT, and Beam or BI achieved train-set quality similar to SMT while often exceeding it on test due to lower overfitting (Bach, 2024).

In column subset selection, the centralized greedy algorithm of the large-scale CSS paper exploits a recursive decomposition of the projection error. If $S$ 4, $S$ 5, and $S$ 6, then

$S$ 7

which reduces each greedy step to maximizing the marginal decrease in reconstruction error. The same work gives rank-1 residual and Gram-matrix updates and extends the method to a MapReduce setting by first building a concise representation $S$ 8 through random projection and then solving generalized CSS subproblems locally and globally (Farahat et al., 2013).

Randomized exact- $S$ 9 methods include projection DPP sampling for CSSP. With $r\times k$ 0 and $r\times k$ 1 the first $r\times k$ 2 right singular vectors, the kernel

$r\times k$ 3

defines a projection DPP whose samples have size exactly $r\times k$ 4. This distribution favors subsets whose coordinate subspace aligns with the PCA subspace while also enforcing repulsion among selected columns. The resulting expected-error bounds improve over volume sampling when $r\times k$ 5-leverage scores are sparse or effectively sparse and the tail singular spectrum is sufficiently flat (Belhadji et al., 2018).

For partition-constrained maximization, the Multinoulli Extension produces a continuous objective

$r\times k$ 6

where each $r\times k$ 7 is drawn i.i.d. from a multinoulli distribution over $r\times k$ 8. The key algorithmic contribution is Multinoulli-SCG, together with online variants Multinoulli-OSCG and Multinoulli-OSGA. A distinctive property is lossless rounding without replacement: for monotone $r\times k$ 9, the rounded feasible set satisfies $V=\bigcup_{k=1}^{K}V_k$ 00 and $V=\bigcup_{k=1}^{K}V_k$ 01 (Zhang et al., 23 Mar 2026).

In calibrated screening, the algorithm is threshold-based rather than combinatorial search in the narrow sense. Using calibration data, one estimates

$V=\bigcup_{k=1}^{K}V_k$ 02

and selects

$V=\bigcup_{k=1}^{K}V_k$ 03

This yields a distribution-free marginal guarantee that the expected number of qualified shortlisted candidates is at least $V=\bigcup_{k=1}^{K}V_k$ 04 (Wang et al., 2022).

Approximate subset and superset search in large set systems is handled by a distinct hashing-based CSS framework. There, sets are bucketed by size and searched using asymmetric supermajority thresholds on queries and stored sets. Performance is governed by exponents

$V=\bigcup_{k=1}^{K}V_k$ 05

which determine query and update costs under the $V=\bigcup_{k=1}^{K}V_k$ 06-GapSS model (Ahle et al., 2019).

6. Applications, terminology, and limits

CSS supports markedly different scientific aims depending on the objective. In subgroup discovery, it is used to produce sparse and alternative descriptions of interesting regions in labeled data, with controlled interpretability and explicit feature dissimilarity between alternative explanations (Bach, 2024). In numerical linear algebra, it supports interpretable low-rank approximation, CUR-style decompositions, feature selection, and distributed summarization of large matrices (Shitov, 2017, Farahat et al., 2013, Belhadji et al., 2018). In screening, it turns arbitrary score functions into shortlists with finite-sample guarantees on expected quality and can enforce per-group guarantees through separate calibration (Wang et al., 2022). In graph settings, it formalizes nearest-neighbor color consistency and strict consistency as subset-minimization problems on paths, trees, spiders, and combs (Manna, 2024). In partition-constrained machine learning, it underlies applications such as video summarization with temporal windows, Bayesian A-optimal design under group quotas, and online subset selection for multi-target tracking with UAVs (Zhang et al., 23 Mar 2026).

Terminology is not uniform. In some papers, “CSS” is a generic abbreviation for constrained subset search; in one paper it specifically denotes Calibrated Subset Selection; and in numerical linear algebra the standard acronym is usually CSSP, for Column Subset Selection Problem (Shitov, 2017, Wang et al., 2022). A precise reading therefore depends on the surrounding objective, constraint class, and data model.

Several limitations recur across the literature. Exact solver-based subgroup discovery has substantial overhead and frequent timeouts on medium or large datasets, with most gains arriving early and longer timeouts showing diminishing returns (Bach, 2024). The Multinoulli Extension is tailored to partition matroids and monotone objectives; extending it to general matroids is described as nontrivial, and non-monotone objectives are not the focus (Zhang et al., 23 Mar 2026). Calibrated screening provides marginal guarantees over random pools, but the cited impossibility result shows that distribution-free individual-pool guarantees are unattainable unless the classifier is omniscient (Wang et al., 2022). Sparse-regression tractability depends critically on fixed block widths and a fixed number of global columns; if these parameters vary with input size, NP-hardness reappears (Pia et al., 2018). In graph consistency, the model is based on nearest selected neighbors in the graph metric rather than only immediate graph neighbors, which changes both the feasible subsets and the algorithmic structure (Manna, 2024).

Taken together, these results define CSS as a technically broad but structurally coherent class of constrained combinatorial optimization problems. The common thread is the search for a subset that is not merely small or sparse, but admissible under explicit domain constraints and optimal relative to a problem-dependent criterion.