colBiSBM: Collaborative Bipartite Models

Updated 3 December 2025

colBiSBM is a family of models that generalizes latent block models to analyze heterogeneous bipartite networks, mixed-data matrices, and collaborative filtering settings.
It unifies collections of bipartite graphs and mixed-data co-clustering by sharing mesoscale block structures and connectivity patterns across different data sources.
Inference is achieved via variational EM methods that handle complex block assignments, tackle identifiability issues, and improve predictive accuracy in diverse applications.

colBiSBM refers to a family of collaborative or collection-based bipartite stochastic block models that generalize classical latent block models (LBMs) to various settings involving bipartite structure, shared mesoscale connectivity, mixed variable types, or collaborative filtering. This framework encompasses models for single bipartite graphs, collections of networks, and heterogeneous data matrices, with the unifying principle that objects in two distinct sets (e.g., rows and columns, users and items, plants and pollinators) are assigned to latent clusters, with relationships or observed values parameterized by cluster interactions.

1. Model Definitions and Variants

colBiSBM frameworks arise in several contexts:

Collections of Bipartite Networks: Here, colBiSBM encodes the assumption that a set of bipartite graphs share a common block structure, typically represented by a shared inter-block connectivity matrix and potentially varying block proportions across networks (Lacoste et al., 1 Dec 2025).
Mixed-Data Matrices: In co-clustering matrices with numerical and binary columns, the mixed-data latent block model (a type of colBiSBM) extends classical LBMs to allow joint block-structure discovery across both types simultaneously (Bouchareb et al., 2022).
Collaborative Filtering: The bipartite mixed-membership stochastic block model ( $\mathrm{BM}^2$ ), also termed collaborative bipartite SBM or colBiSBM, posits latent groupings on both users and items with mixed-membership vectors and infers their interactions for rating prediction (Liu et al., 2023).

The following table summarizes key features of these variants:

colBiSBM Variant	Latent Memberships	Edge/Value Model	Target Data
Collection-based colBiSBM (Lacoste et al., 1 Dec 2025)	One-hot (row, col, per-network)	Shared Bernoulli block matrix	Collections of bipartite networks
Mixed-Data colBiSBM (Bouchareb et al., 2022)	One-hot (row, continuous col, binary col)	Gaussian/Bernoulli blocks	Matrices with mixed columns
Collaborative BM² (Liu et al., 2023)	Dirichlet (users), Dirichlet (items)	Multinomial over ratings	Bipartite user-item rating matrix

2. Probabilistic Specification

Collection-Based Models (Lacoste et al., 1 Dec 2025):

Each bipartite network $G^{(m)} = (U^{(m)},V^{(m)},E^{(m)})$ is assumed to admit a block structure with $Q_1$ row and $Q_2$ column blocks, with a shared connectivity matrix $\Theta = (\theta_{k\ell})$ . For network $m$ , latent assignments $Z_i^{(m)}$ and $W_j^{(m)}$ determine to which blocks each node belongs. Given these assignments, the edges are conditionally independent:

$X_{ij}^{(m)} \mid Z_i^{(m)}=k, W_j^{(m)}=\ell \sim \text{Bernoulli}(\theta_{k\ell})$

Block mixing proportions $\pi^{(m)}, \rho^{(m)}$ may be network-specific (as in the $\pi\rho$ -colBiSBM).

Mixed-Data Models (Bouchareb et al., 2022):
- $j\in J_c$ : $x_{ij}\sim \mathcal{N}(\mu_{kl_c}, \sigma_{kl_c}^2)$
- $j\in J_d$ : $x_{ij}\sim \text{Bernoulli}(\alpha_{kl_d})$
Collaborative Filtering (BM²) (Liu et al., 2023):

Each user $i$ and item $j$ receives a $K$ - and $L$ -dimensional Dirichlet-distributed membership vector ( $\pi_i^U$ , $\pi_j^I$ ). For each observed rating $(i,j)$ : 1. Sample user cluster indicator $Z_{ij}^U$ and item cluster $Z_{ij}^I$ . 2. Given $(k,\ell)$ , sample $c_{ij}$ from $\text{Multinomial}(1; \mu_{k\ell,1},..., \mu_{k\ell,S})$ .

3. Inference and Computational Methods

All colBiSBM variants employ variational EM (VEM) algorithms to approximate the maximum likelihood or posterior estimates of cluster assignments and block parameters, given the intractable nature of full posterior computation in large bipartite models.

Variational Approximation: In all settings, a mean-field variational distribution over latent indicators (node assignments) is assumed, yielding separate factors for each node or feature and allowing closed-form or easily optimized E- and M-step updates.
E-step: Variational posterior responsibilities (e.g., $\tau_{ik}^{(m)}$ and $\varrho_{j\ell}^{(m)}$ in collection colBiSBM, $s_{ik}$ in mixed-data colBiSBM, $\phi$ in BM²) updated via exponentiation, expected log-likelihood, and normalization.
M-step: Block proportions and connectivity/parameter matrices updated in closed form via variational responsibilities.
Complexity: For example, BM² inference is per-iteration $O(R_0(K+L)S + N K + M L)$ , with $R_0$ the number of observed ratings (Liu et al., 2023).

4. Model Selection and Identifiability

All colBiSBM approaches address the critical problem of block-number selection:

Integrated Classification Likelihood (ICL) and similar BIC-like approximations are used to penalize model complexity and avoid overfitting. For example, for collection colBiSBM, the BIC-L criterion includes penalties for the number of blocks and the shared parameter space (Lacoste et al., 1 Dec 2025).
Identifiability is treated rigorously (e.g., Proposition 1 and 2 in (Lacoste et al., 1 Dec 2025)) with conditions guaranteeing that the block-structure and shared parameters are statistically recoverable under suitable block size, distinctness, and support conditions.

5. Empirical Performance and Applications

Recovery of Structure: In simulated and empirical studies, collection-based colBiSBM consistently recovers the correct number of blocks and accurate cluster assignments, as measured by adjusted Rand index (ARI $>0.9$ in favorable regimes) (Lacoste et al., 1 Dec 2025).
Link Prediction: When networks are jointly modeled, statistical strength is shared across the collection, yielding improved prediction and clustering performance compared to fitting each network separately.
Plant–Pollinator Networks: Application to ecological networks reveals that colBiSBM uncovers shared connectivity motifs and functional species roles that are not captured by analyzing networks in isolation (Lacoste et al., 1 Dec 2025).
Mixed-Data Co-Clustering: On simulated data with varying overlap and block counts, the mixed-data colBiSBM achieves excellent recovery of both row and column clusters, outperforming uni-type clustering and maintaining parameter estimation bias below 1% in favorable conditions (Bouchareb et al., 2022).
Collaborative Filtering: BM² outperforms non-Bayesian MMSBM and PMF on MovieLens-100K in mean squared error and mean absolute error, with BM² MSE $\approx$ 1.16 vs. MMSBM 1.19 and PMF 1.26 (Liu et al., 2023).

6. Extensions and Theoretical Implications

colBiSBM and its variants permit several theoretically motivated generalizations:

Nonparametric Priors: Dirichlet process mixtures (stick-breaking priors) allow the number of blocks to be data-driven.
Dynamic Block Models: Latent memberships $\pi(t)$ or connectivity matrices $\mu(t)$ may evolve via (e.g.) linear Gaussian state-space models, generalizing to dynamic bipartite structure.
Side Information: Covariates enter via exponential-family priors on membership vectors, maintaining conjugacy under suitable links (Liu et al., 2023).
Continuous Edges: The block-wise conditional may be Gaussian rather than multinomial or Bernoulli, enabling modeling of continuous or weighted bipartite edges.
Supports and Flexibility: In $\pi\rho$ –colBiSBM, some blocks may be empty in certain networks, producing a broad expressivity in representing shared yet flexibly deployed mesoscale structure.

A plausible implication is that the colBiSBM paradigm supplies a unified analytical framework for heterogeneous bipartite data sources, collections of networks, and collaborative prediction, with clear advantages in structure discovery and predictive accuracy over models that do not pool statistical information or leverage multi-modal structure.

7. Computational and Practical Considerations

Initialization and Convergence: Random initialization may produce trivial solutions in symmetric or weak-signal regimes; imposing minimal update rotations prior to convergence checks is recommended (Bouchareb et al., 2022).
Algorithmic Safeguards: Practical implementations enforce normalization steps on assignment weights after each update to ensure numerical stability.
Runtime: Mixed-data co-clustering with colBiSBM is empirically about ten times slower than single-type (homogeneous) blockmodel implementations due to additional accuracy checks and assignment steps (Bouchareb et al., 2022).

colBiSBM's flexibility and rigorous statistical grounding suggest it is well suited for joint inference in large, multi-modal, or multi-network bipartite domains, evidencing empirical superiority and interpretability across diverse applications (Lacoste et al., 1 Dec 2025, Liu et al., 2023, Bouchareb et al., 2022).