Pairwise-Constrained Clustering

Updated 4 February 2026

Pairwise-Constrained Clustering is a technique that incorporates binary must-link and cannot-link constraints into traditional clustering methods to guide group assignments and improve accuracy.
It employs diverse strategies such as mixed-integer programming, penalty models, and graph-based approaches to effectively integrate side information and fine-tune clustering objectives.
The method has proven practical in semi-supervised learning, active querying, and fairness applications, offering theoretical guarantees and improved scalability for high-dimensional and large-scale datasets.

Pairwise-constrained clustering is a class of algorithms that incorporate side information in the form of explicit binary relations over data pairs—typically must-link (ML: "should co-cluster") and cannot-link (CL: "should not co-cluster") constraints—within the clustering process. These constraints can be hard (enforced without violation), soft (violatable with penalty or probabilistic weight), or even confidence-weighted, and arise naturally in semi-supervised learning, active querying, recommendation systems, and crowdsourced supervision. The injection of such constraints modifies the feasible set or objective of classic clustering formulations (e.g., k-means, spectral, subspace, matrix factorization, or neural embedding-based methods), yielding both theoretical and practical advances in performance and interpretability across numerous domains.

1. Formal Models and Integrations of Pairwise Constraints

Pairwise constraints are most commonly formalized as a set of ML pairs $\mathcal T_{ml}$ requiring $\ell_i=\ell_j$ , and CL pairs $\mathcal T_{cl}$ requiring $\ell_i\neq\ell_j$ , where $\ell_i$ denotes the latent cluster assignment of $x_i$ . These constraints are injected into clustering objectives or assignment spaces by several systematic strategies:

Mixed-integer programming: Direct inclusion at the assignment variable level, e.g., for k-means: $b_{i,k}=b_{j,k}$ for all $(i,j)\in\mathcal T_{ml}$ and $b_{i,k}+b_{j,k}\le1$ for $(i,j)\in\mathcal T_{cl}$ (Chumpitaz-Flores et al., 26 Oct 2025, Chumpitaz-Flores et al., 28 Jan 2026, Baumann et al., 2022, Bibi et al., 2019).
Penalty or probabilistic modeling: Augmenting the loss/objective with penalties, log-likelihoods, or probabilistic priors over pairwise (dis)agreements. For example, negative log-likelihoods of satisfying the constraints, pairwise KL or Euclidean margin losses (Hsu et al., 2018, Manduchi et al., 2021, Hsu et al., 2015, Fogel et al., 2018).
Constraint matrices or graphs: Representing pairwise constraints in graphs/matrices for efficient propagation (e.g., transitive closure, triplet-consistency in CRFs/Spectral/SDP settings) (Shi et al., 2017, Behera et al., 2024, Kumar et al., 2015).
Active/interactive querying: Strategically selecting which pairs to query in order to maximally improve clustering per unit cost, often via uncertainty, information gain, or margin-based selection (Craenendonck et al., 2018, Deng et al., 2024, Lipor et al., 2016).

Generalizations include confidence-weighted or stochastic constraints (soft ML/CL with varying penalty or probability), and compositional constraints (e.g., transitive triplets, relative orderings) (Baumann et al., 2022, Brubach et al., 2021, Jiang et al., 2018).

2. Algorithmic Strategies Across Major Paradigms

Pairwise constraints have been integrated into virtually every major clustering paradigm. The table summarizes the main categories and archetypes:

Base Paradigm	Notable Pairwise-Constrained Variants	Reference
k-Means/min-sum-of-squares	MIP/ADMM exact solvers, PASS, PCCC, SDC-GBB	(Chumpitaz-Flores et al., 26 Oct 2025, Chumpitaz-Flores et al., 28 Jan 2026, Baumann et al., 2022, Bibi et al., 2019)
Spectral/SDP Clustering	SDP with linear or quadratic constraint forms; CRF/Belief-prop	(Shi et al., 2017, Behera et al., 2024, Kumar et al., 2015)
Kernel/Self-tuning Clustering	Constraint satisfaction optimization over kernel families	(Boecking et al., 2022)
Matrix Factorization/NMF	Pairwise/triplet constraints on latent factors (RPR-NMF)	(Jiang et al., 2018)
Subspace Clustering	Active querying/subspace-based selection (SUPERPAC)	(Lipor et al., 2016)
Deep Embedding/Autoencoder	Pairwise loss terms (Siamese/contrastive/likelihood), ADMM, SpherePair	(Fogel et al., 2018, Hsu et al., 2015, Zhang et al., 8 Oct 2025)
Probabilistic/Generative	Likelihoods/Potts priors over constraints (DC-GMM, CCL)	(Manduchi et al., 2021, Hsu et al., 2018)

Contemporary advances include scalable ambiguity-driven subset selection (PASS) (Chumpitaz-Flores et al., 28 Jan 2026), confidence-driven mixed-integer formulations (PCCC) (Baumann et al., 2022), angular-geometry deep embeddings (SpherePair) (Zhang et al., 8 Oct 2025), automated active/semi-supervised querying strategies (A3S, COBRA) (Deng et al., 2024, Craenendonck et al., 2018), and relaxation-free kernelization that maximizes raw constraint satisfaction (KernelCSC) (Boecking et al., 2022).

3. Theoretical Guarantees and Complexity

Approximation, convergence, and feasibility guarantees are available for several regimes:

Exact and approximate optimization: MIP-based global solvers (SDC-GBB, PCCC, PASS) guarantee $\epsilon$ -optimal solutions or explicit optimality gaps for the mixed-integer constrained $k$ -means objective; these exploit ML collapsing and geometric/assignment pruning for scalability up to $n\sim 10^6$ (Chumpitaz-Flores et al., 26 Oct 2025, Chumpitaz-Flores et al., 28 Jan 2026, Baumann et al., 2022, Bibi et al., 2019).
Spectral and SDP relaxations: Convex relaxations (e.g., semidefinite programs with constraint matrices) yield global optima of the relaxed problem; feasibility and rounding schemes for discrete partition assignment are well-developed (Behera et al., 2024).
Probabilistic and likelihood-based models: Negative log-likelihood minimization (e.g., CCL) and generative modeling with Potts-prior (DC-GMM) are convex in parameters except for label assignment; stochastic variational bounds enable scalable inference (Manduchi et al., 2021, Hsu et al., 2018).
Approximation in stochastic/fairness settings: Two-step LP+KT-rounding frameworks admit provable constant-factor approximations for $k$ -center, $k$ -median, and $k$ -means under general stochastic pairwise constraints, including fairness and semi-supervised settings (Brubach et al., 2021).

Complexity remains a challenge—solving the full MIP is NP-hard in $n$ , $K$ , and constraint density, but modern subset-selection, group-based decompositions, and scalable message-passing dramatically extend practical limits (Chumpitaz-Flores et al., 26 Oct 2025, Chumpitaz-Flores et al., 28 Jan 2026, Baumann et al., 2022, Lipor et al., 2016).

4. Practical Algorithms and Computational Strategies

Contemporary methods achieve efficient, scalable execution through several mechanisms:

ML collapse and pseudo-point reduction: Exploit transitivity within ML components to contract the assignment space, preserving global optima and reducing variable counts (Chumpitaz-Flores et al., 26 Oct 2025, Chumpitaz-Flores et al., 28 Jan 2026).
Subproblem-centric optimization: PASS and similar frameworks focus combinatorial search on ambiguity- or violation-concentrated core subsets, solving small ILPs or QUBOs while fixing peripheral labels (Chumpitaz-Flores et al., 28 Jan 2026).
Confidence handling and constraint softening: PCCC and related methods directly encode hard/soft constraint confidence levels as explicit penalties or violation variables in the objective, supporting large constraint sets and variable trust (Baumann et al., 2022).
Active/interactive querying: Strategic selection of ML/CL queries (e.g., based on normalised mutual information-gain in A3S or margin in SUPERPAC) yields rapid accuracy gains with minimal supervision budget (Deng et al., 2024, Lipor et al., 2016).
Distributed/parallel B&B and group-lifted optimization: Advanced global solvers apply grouping and parallel Lagrangian decomposition to achieve strong relaxations and practical scalability (Chumpitaz-Flores et al., 26 Oct 2025).
Neural and geometric embedding architectures: Deep frameworks implement pairwise losses (contrastive, angular, or likelihood-based) inside autoencoder or probabilistic networks, decoupling representation learning from clustering and supporting automatic model order inference (e.g., $K$ selection in SpherePair) (Fogel et al., 2018, Zhang et al., 8 Oct 2025, Hsu et al., 2015).

5. Applications and Empirical Evidence

Pairwise-constrained clustering has been validated across diverse settings, including:

Semi-supervised learning: Minimal supervision (often $<1\%$ of all possible pairs) suffices to dramatically raise clustering accuracy, as shown for images (MNIST, CIFAR), text (Reuters), faces (LFW, IJB-B), recommendation data (MovieLens), and time series (Hsu et al., 2018, Manduchi et al., 2021, Baumann et al., 2022, Shi et al., 2017, Jiang et al., 2018).
Active learning/crowdsourcing: Methods such as COBRA, A3S, and SUPERPAC attain rapid ARI/NMI increases with few queries by maximizing the informational value per pair, outperforming random or greedy selection by large margins (Craenendonck et al., 2018, Deng et al., 2024, Lipor et al., 2016).
Fairness and individual consistency: The SPC framework subsumes individual fairness constraints, admitting meaningful probabilistic or soft pairwise bounds and yielding algorithms that minimize violations at near-vanishing cost increase (Brubach et al., 2021).
Deep generative discovery and transfer learning: Deep autoencoder/likelihood frameworks (CCL, DC-GMM, SpherePair, CPAC) match or surpass state-of-the-art baseline metrics (accuracy, NMI, ARI) on complex discovery and transfer tasks, often without requiring explicit $K$ (Fogel et al., 2018, Manduchi et al., 2021, Zhang et al., 8 Oct 2025).
Quantum and hybrid solvers: Subproblem-focused reductions (PASS) enable near-term quantum algorithms to address otherwise intractable MIP instances for constrained clustering in the $n \sim 10^2$ – $10^3$ regime (Chumpitaz-Flores et al., 28 Jan 2026).

6. Contemporary Challenges and Future Directions

Scalability in high constraint-density or ultra-large $K$ : Current global solvers struggle with highly dense CL graphs or extremely large cluster counts without subset selection (Chumpitaz-Flores et al., 26 Oct 2025, Baumann et al., 2022).
Flexible confidence and noise modeling: Real-world settings demand robust handling of uncertain or noisy pairwise supervision; weighted and probabilistic frameworks (e.g., PCCC, DC-GMM) offer partial solutions but further integration with flexible acquisition and learning strategies remains active.
Cluster-number agnosticism and automatic model selection: Algorithms capable of robust clustering with unknown or varying $K$ (COBRA, SpherePair) are increasingly important, especially in mixed real-world and crowdsourcing contexts (Craenendonck et al., 2018, Zhang et al., 8 Oct 2025).
Stronger generalization and fairness guarantees: Extending current theoretical analyses from worst-case to typical-case, and from expectation to high-probability, especially under soft or stochastic constraints (Brubach et al., 2021).
Integration with representation learning and non-Euclidean domains: Deep angular, kernel, and probabilistic embedding methods (SpherePair, KernelCSC) open the door to robust constraint satisfaction in non-vectorial domains and with weak supervision (Zhang et al., 8 Oct 2025, Boecking et al., 2022).
Hybrid classical/quantum workflows: Subsetting and ambiguity-guided reductions are expected to play a major role in enabling NISQ-era quantum optimization for constrained clustering at practical scales (Chumpitaz-Flores et al., 28 Jan 2026).

7. Summary Table of Key Methods and Results

Algorithm/Framework	Constraint Type	Key Principle	Empirical Highlights	References
SDC-GBB, PASS, PCCC	Hard/Soft ML + CL	ML collapse, B&B, ambiguity subset	$n\sim 10^5$ – $10^6$ feasible, gaps $<$ 3%	(Chumpitaz-Flores et al., 26 Oct 2025, Chumpitaz-Flores et al., 28 Jan 2026, Baumann et al., 2022)
A3S, COBRA, SUPERPAC	Active ML/CL	Info-gain, transitivity, subspace	5–10 $\times$ fewer queries needed	(Deng et al., 2024, Craenendonck et al., 2018, Lipor et al., 2016)
DC-GMM, CCL, CPAC, SpherePair	Probabilistic, embedding	Pairwise likelihood, contrastive/ang.	SOTA NMI/ACC/ARI, robust, $K$ -agnostic	(Manduchi et al., 2021, Hsu et al., 2018, Fogel et al., 2018, Zhang et al., 8 Oct 2025)
KernelCSC, CSDSC	ML/CL, soft-hardened	Constraint-sat. kernel SDP/eigen	Best generalization across 146 datasets	(Boecking et al., 2022, Behera et al., 2024)

Pairwise-constrained clustering constitutes a broad and rapidly advancing research field, spanning exact optimization, convex relaxations, active and semi-supervised strategies, and deep probabilistic modeling. Empirical and theoretical work demonstrate that the strategic use of pairwise constraints—in both hard and soft forms, and even under incomplete or noisy supervision—consistently yields superior clustering outcomes, scaling from classic data sets to modern large-scale and high-dimensional problems.

Markdown Upgrade to Chat

References (18)

A Scalable Global Optimization Algorithm For Constrained Clustering (2025)

PASS: Ambiguity Guided Subsets for Scalable Classical and Quantum Constrained Clustering (2026)

An algorithm for clustering with confidence-based must-link and cannot-link constraints (2022)

Constrained Clustering: General Pairwise and Cardinality Constraints (2019)

A probabilistic constrained clustering for transfer learning and image category discovery (2018)

Deep Conditional Gaussian Mixture Model for Constrained Clustering (2021)

Neural network-based clustering using pairwise constraints (2015)

Clustering-driven Deep Embedding with Pairwise Constraints (2018)

Face Clustering: Representation and Pairwise Constraints (2017)

10.

Spectral Clustering in Convex and Constrained Settings (2024)

11.

Clustering by transitive propagation (2015)

12.

COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints (2018)

13.

A3S: A General Active Clustering Method with Pairwise Constraints (2024)

14.

Leveraging Union of Subspace Structure to Improve Constrained Clustering (2016)

15.

Fairness, Semi-Supervised Learning, and More: A General Framework for Clustering with Stochastic Pairwise Constraints (2021)

16.

Relative Pairwise Relationship Constrained Non-negative Matrix Factorisation (2018)

17.

Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation (2022)

18.

Angular Constraint Embedding via SpherePair Loss for Constrained Clustering (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Pairwise-Constrained Clustering.

Pairwise-Constrained Clustering

1. Formal Models and Integrations of Pairwise Constraints

2. Algorithmic Strategies Across Major Paradigms

3. Theoretical Guarantees and Complexity

4. Practical Algorithms and Computational Strategies

5. Applications and Empirical Evidence

6. Contemporary Challenges and Future Directions

7. Summary Table of Key Methods and Results

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Pairwise-Constrained Clustering

1. Formal Models and Integrations of Pairwise Constraints

2. Algorithmic Strategies Across Major Paradigms

3. Theoretical Guarantees and Complexity

4. Practical Algorithms and Computational Strategies

5. Applications and Empirical Evidence

6. Contemporary Challenges and Future Directions

7. Summary Table of Key Methods and Results

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research