Constrained Clustering Algorithms
- Constrained clustering algorithms are methods that integrate additional supervision, such as pairwise, cardinality, and fairness constraints, into traditional clustering objectives.
- They employ optimization strategies like mixed-integer programming and ADMM to manage NP-hard problems and enhance solution robustness.
- These algorithms are applied in domains requiring balanced, spatially coherent, and fair cluster formations, offering improved interpretability over unconstrained methods.
Constrained clustering algorithms are a class of methods that incorporate additional supervision—such as pairwise must-link/cannot-link relationships, capacity, balance, fairness, or spatial/temporal structure—into unsupervised partitioning objectives (e.g., -means, -median, -center, mixture models, spectral clustering). Such supervision arises naturally in applied domains where instance relationships, cluster sizes, or geometry are known or required a priori. These constraints fundamentally alter the computational landscape and statistical properties of clustering, often rendering the problem NP-hard but also enabling more interpretable, robust, or fair solutions unobtainable by unconstrained approaches.
1. Fundamental Constraint Types and Problem Classes
Constrained clustering problems augment standard objectives with sets of requirements, most commonly classified as:
- Pairwise constraints: Must-link (, enforce same-cluster) and cannot-link (, enforce different-cluster), either hard (must be satisfied) or soft (penalized if violated) (Baumann et al., 2022, Chumpitaz-Flores et al., 26 Oct 2025).
- Cardinality constraints: Explicit requirements on cluster sizes, e.g., for cluster , to induce balance or satisfy physical or application-specific requirements (Bibi et al., 2019).
- Attribute/fairness/diversity constraints: Limitations on the composition of clusters regarding protected or categorical attributes; e.g., -diversity or chromatic constraints (Schmidt et al., 2021, Ding et al., 2018).
- Instance-level and spatial constraints: Incorporation of spatial/temporal proximity, region contiguity, or autocorrelation structure (Wang et al., 2024, Yuan et al., 2019, Guo et al., 2024).
- Outlier constraints: Allowance to leave at most points unassigned (outlier-aware variants) in -means, -median, or -center (Jaiswal et al., 2023).
Constraints may be encoded as strict feasibility requirements (integer variables, matching/matching inequalities) or relaxed to penalties in the cost function, with varying implications for algorithmic complexity and solution quality.
2. Algorithmic Methodologies and Optimization Frameworks
A wide spectrum of algorithmic paradigms has emerged, encapsulating the diversity of constraints and target objectives:
Integer and Continuous Formulations
- Mixed-integer programming (MIP, IP): Classical -means or -center is cast as an IP augmented with binary assignment variables, indicator variables for constraints, and explicit balance, pairwise, or capacity constraints. Handling both cardinality and pairwise constraints yields high-dimensional binary programs, which are often made tractable by continuous relaxations, variable elimination, or ADMM-based continuous reformulation (Bibi et al., 2019).
- Alternating direction method of multipliers (ADMM): Sophisticated ADMM block-splitting enables the conversion of nonconvex binary/quadr