Constrained Clustering Algorithms

Updated 19 January 2026

Constrained clustering is a method that integrates feasibility and domain-specific constraints (e.g., must-link, cannot-link) into traditional clustering algorithms.
These algorithms extend classical techniques such as k-means and spectral clustering by using modified heuristics, penalty-based relaxations, and optimization frameworks.
Empirical evaluations show that even sparse constraints can improve clustering metrics by 10–20% and enhance scalability via parallel and relaxation strategies.

A constrained clustering algorithm is any clustering procedure in which the admissible cluster assignments are restricted by a set of additional feasibility, background knowledge, or domain-imposed requirements. The most canonical form of constraints is pairwise supervision: must-link (ML, two items must be in the same cluster) and cannot-link (CL, two items must be in different clusters), but broader categories include various forms of group-wise, cardinality, balance, or fairness requirements. Constrained clustering arises in semi-supervised learning, computational biology, fairness-aware data mining, and other high-impact domains where pure unsupervised partitionings do not capture domain-specific structure.

1. Formal Models of Constrained Clustering

The constrained clustering paradigm extends classical objectives (e.g., $k$ -means, $k$ -median, $k$ -center, spectral clustering, correlation clustering) by appending explicit feasibility constraints to the clustering assignment variables:

Pairwise constraints (must-link/cannot-link): $ML \subseteq \{(i,j) | 1 \leq i < j \leq n\}$ and $CL \subseteq \{(i,j)\}$ .
Cardinality constraints: e.g., prescribed lower/upper bounds on cluster sizes or cluster label proportions.
Fairness/balancing/groupwise constraints: requirements that clusters satisfy certain group memberships or demographic mixes.

The feasibility region thus comprises all clusterings $c(\cdot)$ (labelings, assignment matrices, etc.) such that the specified constraints are satisfied. For example, the constrained $k$ -means problem is typically formulated as:

$\min_{C, S}\; \|X - CS\|_F^2 \quad \text{s.t.} \quad S \in \Omega$

where $S$ (assignment matrix) must satisfy one-hot, ML/CL, and possibly size or balance constraints, and $C$ is the matrix of centroids (Le et al., 2018, Bibi et al., 2019).

Some models relax infeasible (e.g., soft) constraints into the objective with penalty weights (as in (Baumann et al., 2022, Jia et al., 16 Jan 2026)), or encode "degree of confidence" in the feasibility requirement (Baumann et al., 2022).

2. Algorithmic Approaches

Algorithmic strategies for constrained clustering vary depending on the type and quantity of constraints, computational scale, and whether statistical or worst-case guarantees are sought:

Direct extension of classical heuristics:
- Modification of k-means/k-medoids/Lloyd's algorithm to enforce constraints per iteration (e.g., COP-KMeans, BCKM) (Le et al., 2018).
- Integer or binary programming formulations to enforce feasibility (e.g., linear/binary optimization with assignment variables and ML/CL constraints) (Bibi et al., 2019, Baumann et al., 2022, Chumpitaz-Flores et al., 26 Oct 2025).
Spectral methods for graph-based constraints/quadratic encodings:
- Generalized or constrained spectral clustering via Laplacians or generalized eigenvalue problems; degree-of-belief encoded via quadratic constraint matrices (Wang et al., 2012, Cucuringu et al., 2016).
Penalty-based and relaxation methods:
- Constraint violation penalties in the objective with tunable weights, allowing for soft constraint satisfaction (Baumann et al., 2022, Jia et al., 16 Jan 2026).
- Continuous relaxations and augmented Lagrangian/ADMM for large-scale or nonconvex models (Bibi et al., 2019).
Combinatorial search and branch-and-bound:
- Globally optimal methods via (mixed-integer) branch-and-bound with aggressive pruning, Lagrangian relaxation, and geometric elimination (Chumpitaz-Flores et al., 26 Oct 2025).
- PTAS/FPTAS for special cases (e.g., constant-factor approximations for constrained $k$ -center or $k$ -median) (Guo et al., 2024, Jaiswal et al., 2023, Ding et al., 2018).
Ensemble/selective methods:
- Constraint-based selection among a diverse pool of clusterings precomputed by unsupervised algorithms (COBS) (Craenendonck et al., 2016).

Deep learning methods generalize the above by designing differentiable loss functions encoding constraints and training neural embeddings end-to-end (Zhang et al., 2021, Zhang et al., 2019, Manduchi et al., 2021). In these models, constraints are mapped to (possibly soft) penalty terms, and the full objective is optimized via stochastic minibatch gradient methods.

3. Constraint Types and Their Incorporation

Recent literature systematizes a broad taxonomy of constraints:

Pairwise must-link/cannot-link: Enforced via assignment equalities/inequalities, quadratic terms, or soft penalties.
Group/setwise (e.g., must-link over $|X|>2$ ): Modeled as group equalities or via representative centroids (Jia et al., 16 Jan 2026).
Cardinality/balance (cluster sizes, demographic attributes): Linear constraints or squared-deviation penalties on cluster sizes or attribute proportions (Bibi et al., 2019, Zhang et al., 2021).
Triplet and higher-order constraints: Margin-based penalty terms (triplets: $a$ should be closer to $p$ than $n$ in assignment space) (Zhang et al., 2021, Zhang et al., 2019).
Continuous-valued and domain-informed constraints: Instance difficulty (confidence/uncertainty per point label); distributional priors; fairness or protected-attribute cardinalities; must-link confidence weights (Zhang et al., 2021, Baumann et al., 2022).

The precise mechanism (hard constraints, Lagrangian penalty, expectation in probabilistic model) varies by method, but the consensus is that proper encoding of constraint strength improves clustering quality and allows for flexible integration of side information.

4. Notable Algorithms and Theoretical Guarantees

Several contemporary contributions exemplify state-of-the-art algorithms and bounds:

Algorithm	Core Technique	Key Properties or Results	Reference
COBS	Constraint-based selection	Outperforms semi-supervised baselines across datasets, highly parallelizable	(Craenendonck et al., 2016)
PCCC	Integer programming + soft/hard constraints	Handles up to 60K objects and millions of pairs, balances feasibility, runtime and ARI	(Baumann et al., 2022)
Deep constrained clustering (DEC+constraints)	End-to-end differentiable loss	Handles pairwise, triplet, cardinality, and ontology constraints; 10–20% accuracy boosts	(Zhang et al., 2021, Zhang et al., 2019, Manduchi et al., 2021)
SDC-GBB	Global branch-and-bound	Deterministic, scalable to $10^5$ – $10^6$ points, optimality gap $<3\%$	(Chumpitaz-Flores et al., 26 Oct 2025)
Peeling-and-Enclosing (PnE)	Enumeration + geometric search	$(1+\varepsilon)$ -approx in near-linear time for $k$ -CMeans/Medians	(Ding et al., 2018)
Constrained $k$ -center with Reverse Dominating Set	Matching/LPC-based, ML+CL	First 2-approximation, polynomial time, robust to noisy constraints	(Guo et al., 2024)
Constraint-based deep GMM	VAE with constraint Potts prior	Integrates constraint matrix in prior+ELBO, empirically robust/noise-tolerant	(Manduchi et al., 2021)
Optimized text clustering (LLM-sets)	Set-based ML/CL from LLM, penalized $k$ -means	$>20\times$ query reduction, up to $+10\%$ ARI over previous LLM-based	(Jia et al., 16 Jan 2026)
Multi-view propagation + co-EM	EM with cross-view constraint transfer	Outperforms single-view and direct mapping, robust under incomplete mapping	(Eaton et al., 2012)

These methods range from globally optimal (intractable for large $n$ without search reduction) to approximation with explicit guarantees (e.g., $(1+\varepsilon)$ -approximate, factor-2 for $k$ -center), and scalable heuristics effective in practice.

5. Empirical Evaluation and Practical Guidance

Large-scale benchmarking demonstrates:

Constraint effectiveness: Even sparse constraints deliver substantial ARI/NMI gains (10–20%) over unsupervised baselines (Zhang et al., 2021, Zhang et al., 2019, Craenendonck et al., 2016).
Diminishing returns/noisy supervision: Beyond a threshold, additional or noisy constraints yield modest further improvement; down-weighting or separating high/low confidence is beneficial (Zhang et al., 2021, Baumann et al., 2022, Jia et al., 16 Jan 2026).
Algorithm selection via constraints: Using constraints to select among heterogeneous clustering models/hyperparameters is often more effective than adapting a single algorithm (Craenendonck et al., 2016).
Scalability: Techniques such as ML contraction, nearest-neighbor assignment restriction, parallel branch-and-bound, and stochastic minibatch optimize performance on data with $n>10^5$ (Chumpitaz-Flores et al., 26 Oct 2025, Baumann et al., 2022, Schesch et al., 2024).
Practical recommendations: Choice of penalty weights (soft-constraint vs. hard-constraint), constraint propagation threshold, and initialization strategies significantly affect practical solution quality (Baumann et al., 2022, Zhang et al., 2021).

Deep frameworks and proper aggregation of constraints into a unified loss mitigate classical issues such as negative performance from arbitrary constraint sets (“negative-ratio”), and enable incorporating multi-modal or ontology-based supervision (Zhang et al., 2021, Zhang et al., 2019).

6. Open Problems and Future Directions

Constraint generality: Powerful frameworks accommodate arbitrary group- or cardinality-based primitives, and methods continue to extend the space of feasible constraints (label diversity, fairness, triplets, etc.) (Bibi et al., 2019, Zhang et al., 2021).
Global optimality at scale: Deterministic global $\varepsilon$ -optimality remains computationally intensive; parallelism and ML collapse are effective, but inherent hardness remains ( $k$ -means with ML/CL is NP-hard) (Chumpitaz-Flores et al., 26 Oct 2025).
Active learning and constraint selection: Active query strategies leveraging geometric or statistical structure sharply reduce required supervision and human effort (Lipor et al., 2016).
Constraint propagation and multi-view learning: Propagating constraints via model-aware similarity and cross-view mappings substantially improves generalization in multi-modal settings (Eaton et al., 2012).
Automated or LLM-generated constraints: Integrating automatically generated, noisy, or high-level constraints and optimizing robustness/penalty schemes is an emerging direction with demonstrated impact (Jia et al., 16 Jan 2026).
Theoretical gaps: Constant-factor guarantees for more general constraints (beyond $k$ -center or $k$ -median), especially with group-wise or intersecting constraints, are active areas of research (Guo et al., 2024, Ding et al., 2018).

The development of constrained clustering algorithms continues to be driven by advances in optimization (e.g., continuous reformulations, penalty methods, global search), statistical learning (deep architectures, kernel learning), and the incorporation of increasingly rich and structured forms of side information (Craenendonck et al., 2016, Bibi et al., 2019, Chumpitaz-Flores et al., 26 Oct 2025, Zhang et al., 2021, Guo et al., 2024).