Conditional Pairwise Clustering (ConPaC)

Updated 23 February 2026

Conditional Pairwise Clustering (ConPaC) is a clustering approach that defines clusters solely through pairwise relationships while enforcing global transitivity for cohesive grouping.
It integrates data-derived unary potentials with higher-order constraints and user-provided must-link and cannot-link inputs using a factor-graph formulation.
ConPaC employs various inference strategies—including message passing, EM, and neural network optimization—and scales to large datasets via k-NN approximations.

Conditional Pairwise Clustering (ConPaC) refers to a family of clustering techniques that define cluster assignments exclusively through pairwise relationships under the constraint of global consistency—typically, transitivity—on the induced adjacency structure. ConPaC methods are agnostic to the cluster count, operate directly on pairwise similarities or constraints, and can incorporate user-provided must-link and cannot-link information. They are united by a factor-graph (Conditional Random Field or CRF) formulation with unary data terms (derived from data or learned), higher-order constraints enforcing global validity of clustering, and inference strategies ranging from max-sum belief propagation to probabilistic EM.

1. Theoretical Foundations

At the mathematical core, ConPaC replaces the traditional assignment of instances to clusters via explicit labels or centroids with a determination of the binary adjacency variables $Y_{ij}$ , where $Y_{ij}=1$ indicates that instances $i$ and $j$ share a cluster. The admissibility of any assignment $Y=\{Y_{ij}\}$ is subject to a global transitivity constraint: for all triples $(i,j,k)$ , the adjacency configuration must correspond to an equivalence relation. More formally, the triple $(Y_{ij}, Y_{ik}, Y_{jk})$ is only valid if it is transitive (i.e., clusters are cohesive). The global objective combines unary potentials from the observed data or model and hard constraints via higher-order factors:

$P(Y|X) \propto \prod_{i < j} \psi_u(Y_{ij}) \prod_{i < j < k} \psi_t(Y_{ij}, Y_{ik}, Y_{jk}),$

where $\psi_u(Y_{ij})$ is derived from similarity or model-based likelihoods, and $\psi_t$ enforces transitivity, penalizing non-transitive (V-shaped) triples with an energy $\alpha \rightarrow \infty$ (Kumar et al., 2015, Shi et al., 2017).

This approach generalizes hard clustering (by adjacency graph finding), semi-supervised clustering (via must/cannot-link potential overrides), and probabilistic mixture clustering with pairwise constraints.

2. Representative ConPaC Algorithms and Formulations

2.1 CRF-based Adjacency Optimization

In "Face Clustering: Representation and Pairwise Constraints" (Shi et al., 2017), ConPaC is formulated as a CRF over all pairwise adjacency variables $Y_{ij}$ . The unary potential $D(Y_{ij}) = -\log \psi_u(Y_{ij})$ is derived from a monotonic transformation of the pairwise similarity score (e.g., cosine similarity of deep $\ell_2$ -normalized ResNet features). The triplet potential penalizes inconsistent adjacency triples, ensuring that the resulting binary adjacency matrix corresponds to a transitive clustering. No cluster count $K$ is required in advance; the number of clusters is induced by the partitioning of the adjacency graph.

2.2 Message Passing and MAP Inference

MAP inference in this high-order CRF is accomplished via loopy belief propagation in the min-sum variant. At each iteration, messages (essentially cost accumulators) are passed among pairwise variables and triplet factors. The costly $O(N^3)$ complexity is mitigated in large-scale settings by restricting to k-NN neighborhoods; all higher-order potentials and updates remain unchanged except for the restricted variable scope. The optimal clustering is defined by the connected components post-inference, with a final transitive closure to enforce cluster validity (Kumar et al., 2015, Shi et al., 2017).

2.3 Generative and EM-based Approaches

A generative take on ConPaC is found in (Yu et al., 2018), where Gaussian Mixture Models (GMMs) are augmented to probabilistically respect must-link and cannot-link constraints at the latent variable level. The complete-data likelihood is composed of (i) standard sample-wise terms, (ii) joint latent assignments for must-link pairs, and (iii) exclusion of shared labels for cannot-link pairs. The EM algorithm is extended to jointly update cluster parameters and mixture weights, introducing closed-form responsibilities for both unconstrained and constrained data pairs, and employing a low-dimensional convex optimization for mixture proportions. This full probabilistic treatment allows flexible modeling of both parametric and nonparametric class-conditional densities while respecting pairwise input (Yu et al., 2018).

2.4 Neural Network–based Pairwise Objective

"Neural network-based clustering using pairwise constraints" (Hsu et al., 2015) situates ConPaC in an end-to-end differentiable learning scenario, using a convolutional network to output soft cluster assignment probabilities $P(x) \in \mathbb{R}^k$ without explicit centroids. The objective is a symmetric, pairwise, contrastive KL loss:

$L(P, Q) = \ell(P \parallel Q) + \ell(Q \parallel P),$

where

$\ell(P \parallel Q) = I_s \text{KL}(P \parallel Q) + I_{ds} \max\{0, m - \text{KL}(P \parallel Q)\}$

and $I_s$ / $I_{ds}$ are must-link/cannot-link indicators. This procedure trains feature extraction and cluster assignment jointly, requiring only a partial set of pairwise constraints. Hard cluster assignments arise from the maximum probability index at the output. The number of clusters need not be specified; the network automatically uses only the required number of output units (Hsu et al., 2015).

3. Handling Pairwise Constraints and Transitivity

All ConPaC variants accommodate supervision as pairwise must-link $(i, j)$ or cannot-link $(a, b)$ information. Enforcement is accomplished by setting the corresponding unary potentials to force agreement/disagreement—i.e., infinite energy for violations—propagating these conditions through all triplet factors to ensure global consistency (Shi et al., 2017, Hsu et al., 2015). In the generative EM setting, must-links are encoded as shared latent assignments, and cannot-links as joint assignments with zero probability for the same cluster (Yu et al., 2018).

Transitivity is universally enforced by explicit triple factors (or their algorithmic analogs) that render any locally inconsistent adjacency pattern infeasible, thereby guaranteeing that all connected components produced post-inference correspond to valid clusters (Kumar et al., 2015, Shi et al., 2017).

4. Computational Strategies and Scalability

Canonical MAP or marginal inference in fully connected CRFs with transitivity constraints suffers $O(N^3)$ time and space complexity, as all $O(N^3)$ triples must be considered. For large-scale applications (e.g., face clustering in LFW and IJB-B with $N \sim 10^4$ – $10^5$ ), ConPaC leverages approximate k-NN graphs, so only a sparse subset of adjacency variables and their neighboring triples are instantiated. Per-iteration complexity thus reduces to $O(N k^2)$ , where $k$ is the neighborhood size, yielding near-identical clustering performance to the dense version on typical visual data (Shi et al., 2017).

Neural network–based ConPaC variants sidestep explicit enumeration of adjacency variables via mini-batch stochastic optimization. EM-based implementations parallel their unconstrained clustering counterparts except for additional loops over constraint sets, with computational cost remaining subdominant given moderate numbers of pairwise constraints (Hsu et al., 2015, Yu et al., 2018).

5. Empirical Performance and Properties

ConPaC frameworks consistently outperform traditional clustering schemes—such as k-means, spectral clustering, and rank-order clustering—especially in complex, high-dimensional or poorly separated scenarios, and when only sparse pairwise constraints are available. Key findings include:

On face clustering (LFW), ConPaC achieves high pairwise F-scores (0.965 unsupervised, 0.975 with semi-supervised constraints), outperforming agglomerative (0.962) and rank-order (0.861) baselines by substantial margins. The method remains robust under the addition of $10^6$ distractor faces (Shi et al., 2017).
EM-based ConPaC on UCI and manifold datasets makes superior use of limited constraint input, achieving higher purity than previous probabilistic and spectral methods with significantly fewer links (Yu et al., 2018).
Neural network–based ConPaC achieves $>0.9$ purity and $>0.8$ NMI on MNIST with as few as $0.067\%$ of all possible constraints, and copes with over-specified cluster counts by self-selecting the requisite number of clusters (Hsu et al., 2015).
All approaches show notable robustness to noisy labels (e.g., up to 10% constraint noise) and unknown $k$ (“overclustering”) (Hsu et al., 2015, Shi et al., 2017).

Runtime and memory requirements remain tractable for $N\sim10^4$ – $10^5$ with k-NN sparsification, with further gains from batch-based or parallelized neural network and EM algorithms.

6. Generalizations and Extensions

ConPaC extends seamlessly to:

Uncertainty quantification: sum-product inference yields marginal edge probabilities, supporting thresholded or soft clustering outputs (Kumar et al., 2015).
Semi-supervised and interactive clustering: easy incorporation of user constraints as hard potentials (Shi et al., 2017).
Generative modeling of class substructure: mixture-of-Gaussian expansions for within-cluster heterogeneity with minimal changes to update formulae (Yu et al., 2018).
Multiscale or ultrametric clustering: extension of triple-factor constraints to enforce hierarchical or more general relational structures (Kumar et al., 2015).
End-to-end deep learning: unified architectures learning cluster assignment and feature transformations from raw data (Hsu et al., 2015).

A plausible implication is that ConPaC’s principled factor-graph formulation—linking local data-derived similarity with global clustering structure—renders it adaptable across domains, scalable to large data, and robust to partial supervision and noise.

7. Connections to Broader Clustering Literature

ConPaC contrasts with classical clustering by (i) strictly defining clusters via pairwise relations and their transitive closures, (ii) avoiding prespecification of cluster count, and (iii) uniformly treating hard constraints, soft constraints, and unlabeled data within a single global optimization. Its message-passing and CRF formulation are direct transpositions of graphical model inference, but can incorporate feature learning, metric learning, and generative density estimation in a modular fashion (Shi et al., 2017, Hsu et al., 2015, Yu et al., 2018). Its efficacy and flexibility position it as a unifying perspective on constraint-based clustering in modern machine learning.

Markdown Report Issue Upgrade to Chat

References (4)

Clustering by transitive propagation (2015)

Face Clustering: Representation and Pairwise Constraints (2017)

Clustering With Pairwise Relationships: A Generative Approach (2018)

Neural network-based clustering using pairwise constraints (2015)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Pairwise Clustering (ConPaC).