Constrained Biclustering with Pairwise Constraints

Updated 8 August 2025

The paper introduces a framework that integrates must-link and cannot-link constraints to guide biclustering.
It formulates the problem as a quadratically constrained quadratic program solved via SDP branch-and-cut and low-rank heuristic methods.
Empirical evaluations show enhanced clustering quality and scalability in applications such as gene expression and document mining.

Constrained biclustering with pairwise constraints refers to the simultaneous partitioning of the rows and columns of a data matrix, subject to must-link and cannot-link relationships that encode problem-specific knowledge about which objects or features must—or must not—be grouped together. This paradigm generalizes semi-supervised clustering, extending its benefits to co-clustering and biclustering domains in mathematical optimization and machine learning. Such constraints are implemented to enhance interpretability, integrate prior information, and align cluster structures with domain insights. Modern research formalizes the model as constrained versions of combinatorial biclique problems, introduces advanced relaxation and rounding schemes, and develops scalable heuristics applicable to large-scale data.

1. Mathematical Formulation and Model Problem

Constrained biclustering is formally modeled as a constrained version of the k-densest disjoint biclique (k-DDB) problem in a weighted bipartite graph, where nodes correspond to rows (U) and columns (V) of a matrix and edges carry weights from matrix values (Sudoso, 7 Aug 2025). The feasible solution consists of k disjoint bicliques—pairs of row and column subsets that form complete bipartite graphs—maximizing total density while satisfying pairwise must-link and cannot-link constraints. The problem is typically expressed as the following quadratically constrained quadratic program (QCQP):

$\begin{aligned} &\underset{Y_U, Y_V}{\text{maximize}} && \operatorname{tr}(Y_U^T A Y_V) \ &\text{subject to} && Y_U^T Y_U = I_k, \;\; Y_V^T Y_V = I_k, \ &&& Y_U Y_U^T \mathbf{1}_n = \mathbf{1}_n, \;\; Y_V Y_V^T \mathbf{1}_m = \mathbf{1}_m, \ &&& Y_U, Y_V \geq 0, \end{aligned}$

where $Y_U \in \mathbb{R}^{n \times k}$ and $Y_V \in \mathbb{R}^{m \times k}$ are clustering matrices for rows and columns, respectively, and $A$ is the data matrix. Pairwise constraints are imposed as algebraic equalities or inequalities on these matrices.

Must-link: For each pair $(u_i, u_j) \in$ \text{ML}U $, enforce$ (Y_U){i\ell} - (Y_U)_{j\ell} = 0 $for all$ \ell $.</li> <li>Cannot-link: For each$ (u_i, u_j) \in $\text{CL}<em>U$ , enforce $(Y_U Y_U<sup>T)</sup></em>{ij} = 0$ .

Similar constraints apply to columns.

The model encapsulates the central task: seeking biclusterings that are both maximal with respect to the chosen density and strictly consistent with imposed pairwise constraints.

2. Pairwise Constraints: Must-link and Cannot-link Semantics

Pairwise constraints guide biclustering by enforcing semantic relationships:

Must-link constraints force two objects (rows or columns) to be assigned to the same bicluster, aggregating them into single entities under transitive closure. This reduces effective problem size and tightens interpretability.
Cannot-link constraints prohibit joint assignment, enforcing $(Y Y^T)_{ij} = 0$ in the variable matrix, which precludes shared cluster membership.

In optimization-driven formulations, the transitive closure of must-link relationships partitions nodes, and cannot-links are enforced via zeroing conditions on block matrices or explicit combinatorial ILP constraints (Sudoso, 7 Aug 2025, Bibi et al., 2019). Additional settings integrate soft penalty terms or confidence-based constraints to accommodate uncertain or noisy supervision (Baumann et al., 2022).

Pairwise constraints are widely used in semi-supervised clustering for single-object clustering (COP-KMeans, MPCKMeans (Craenendonck et al., 2016)), non-negative matrix factorization via triplet-based penalties (RPR-NMF (Jiang et al., 2018)), generative models (Yu et al., 2018), and convex relaxations (Behera et al., 3 Apr 2024).

3. Exact and Heuristic Algorithms

Exact Branch-and-Cut via Semidefinite Programming (SDP)

The exact approach is a specialized branch-and-cut algorithm leveraging SDP relaxations for constrained biclustering (Sudoso, 7 Aug 2025). After formulating QCQP with algebraic encodings of constraints, the problem is lifted into SDP form:

$Z = \begin{bmatrix} Z_{UU} & Z_{UV} \ Z_{UV}^T & Z_{VV} \end{bmatrix} = \begin{bmatrix} Y_U \ Y_V \end{bmatrix} \begin{bmatrix} Y_U \ Y_V \end{bmatrix}^T$

with $Z \succeq 0$ and paired/triangular valid inequalities to tighten relaxations. The cutting-plane method enriches the SDP relaxation with violated constraints, solved via first-order methods (SDPNAL+), and feasible biclusterings are obtained by (i) k-means on the SDP solution followed by (ii) integer programming for refinement under hard constraints.

Branching is performed on ambiguous pairs, with nodes corresponding to additional must-link or cannot-link assignments. Empirically, global optimality is often certified at the root due to tight relaxations and effective rounding.

Low-Rank Factorization Heuristic

For large-scale instances where SDP solvers are computationally prohibitive, a low-rank Burer–Monteiro factorization is applied (Sudoso, 7 Aug 2025):

$Z = [Z_U; Z_V][Z_U; Z_V]^T$

with $Z_U \in \mathbb{R}^{\bar{n} \times r}$ , $Z_V \in \mathbb{R}^{\bar{m} \times r}$ , and $r > k$ for flexibility. The constrained optimization over $Z_U, Z_V$ is solved inside an augmented Lagrangian framework (ALM), with block-coordinate projected gradient updates and adaptive step length (Barzilai–Borwein alternated with Armijo line search). These heuristics achieve high-quality solutions efficiently and scale to regime sizes typical of large gene expression and document clustering tasks.

Table: Algorithmic comparison

Approach	Principle	Scale
SDP Branch-and-Cut	SDP relaxation + rounding	Small–Medium
Low-Rank Heuristic (BM/ALM)	Nonconvex optimization	Large
ILP/CPLEX/Generic Solvers	Discrete optimization	Small

Biclustering using pairwise constraints connects closely with methods in semi-supervised co-clustering, non-negative matrix factorization (NMF, RPR-NMF), generative models, and convex relaxations:

RPR-NMF integrates relative pairwise relationships as triplet constraints into the matrix factorization objective using exponential and hinge penalties (for Euclidean and symmetric divergence measures, respectively) (Jiang et al., 2018). Multiplicative updates are derived with auxiliary bounding functions and approximations for tractability.
Generative approaches directly encode pairwise constraints into the data likelihood, with modified EM steps for must-link and cannot-link pairs (for example, as shared latent variables) (Yu et al., 2018).
Constrained spectral clustering formulations embed pairwise constraints into semidefinite relaxations, producing generalized eigenproblems with constraint matrices $Q$ (Behera et al., 3 Apr 2024).
Integer-programming formulations (e.g., binary assignments with quadratic and cardinality constraints) have been handled using alternating direction methods (ADMM) and auxiliary continuous relaxations (Bibi et al., 2019).

The compatibility between these frameworks allows for the transfer and integration of pairwise constraints into varied biclustering models, including those that operate over binary, real, or mixed data types.

5. Evaluation, Performance, and Empirical Observations

Comprehensive experiments on synthetic and real-world data matrices (gene expression, text corpora) validate the efficiency of constraint-aware biclustering algorithms (Sudoso, 7 Aug 2025). Key metrics include:

Optimality gap (SDP-based branch-and-cut typically closes at the root node).
Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI) for clustering quality.
Node count reduction and runtime efficiency in comparison to general-purpose solvers (GUROBI, BARON), with orders-of-magnitude improvements observed.
For large datasets, the low-rank heuristic consistently produces clusterings matching or closely tracking ground truth, with high ARI/NMI even under increased constraint density.

Pairwise constraint incorporation is shown to improve solution interpretability and adherence to domain knowledge but may introduce complexity, especially when cannot-link constraints are dense.

Table: Observed empirical properties

Metric	Branch-and-Cut	Low-Rank Heuristic	General Purpose Solvers
ARI/NMI	High	High	Variable
Optimality Gap	Low/Closed	Small	Potentially Large
Runtime	Low–Medium	Low	High
Scalability	Medium	Large	Poor

6. Applications and Extensions

This paradigm finds application in bioinformatics (gene co-expression biclustering), text mining (co-clustering documents and terms with semantic links), market basket analysis, and other domains requiring simultaneous clustering and prior knowledge integration:

In genomics, background knowledge about gene interactions translates into must-link/cannot-link constraints, leading to biologically coherent biclusters.
In text mining, thesauri or entity relationships inform pairwise constraints, refining topic discovery and document clustering.
In recommender systems and user-behavioral analytics, pairwise relational data shapes biclustered groupings consistent with observed interactions.
In convex spectral and generative models, extensions to biclustering are facilitated by adapting pairwise constraint matrices for both rows and columns, retaining tractable relaxations (Behera et al., 3 Apr 2024, Yu et al., 2018).

Scalable algorithms with provable guarantees expand applicability to very large matrices, supporting practical deployment in high-throughput and streaming contexts.

7. Theoretical Properties and Prospective Directions

Theoretical analysis reveals that aggregation via must-link transitive closure can reduce the complexity of the problem and that strict enforcement of pairwise constraints may facilitate tight relaxations (e.g., SDP). However, nonconvexity and coupling effects require sophisticated rounding and decomposition strategies. Contemporary research explores:

Strengthening relaxations with advanced valid inequalities (pair, triangle).
Designing heuristics that balance nonconvex optimization with computational tractability.
Managing uncertain, soft, and confidence-weighted constraints in practical contexts (Baumann et al., 2022).
Extending frameworks to multiway clustering, multi-view data, and settings with side-information or stochastic constraints (Brubach et al., 2021).
Analysis of scaling regimes, robust objective functions, and the impact of constraint structure on cluster stability.

A plausible implication is that further integration of constrained biclustering with deep learning-based representation learning (e.g., pcGRBM, NMF with pairwise penalties (Chu et al., 2017, Jiang et al., 2018)) may yield new advances in semi-supervised clustering across complex, multimodal data.

Overall, the development of rigorous optimization algorithms for constrained biclustering with pairwise constraints establishes a benchmark for the field and frames future research in interpretable, scalable, and domain-informed co-clustering.