Least-Squares Graphon-BPS Estimation

Updated 23 December 2025

The paper presents a methodology that minimizes integrated squared error over convex combinations of candidate graphons to achieve rigorous oracle inequalities and minimax-optimal rates.
It adapts to diverse network regimes—including dense, sparse, and heavy-tailed—by leveraging blockwise representations and Bayesian predictive synthesis.
The approach generalizes to bipartite and dynamic networks, utilizing spectral initialization, block-coordinate descent, and penalized techniques for efficient computation.

Least-Squares Graphon-BPS is a principled, model-agnostic methodology for graphon estimation based on integrated squared error minimization over convex combinations or blockwise representations. Arising at the intersection of nonparametric network inference and Bayesian predictive synthesis, least-squares Graphon-BPS achieves minimax-optimal rates for dense, sparse, and heavy-tailed random graphs, with rigorous nonasymptotic oracle inequalities, adaptivity, and robust structural guarantees on estimated network properties. The method admits natural generalizations to bipartite and dynamic graphon contexts.

1. Formal Definition and Methodology

Let $w_0 : [0,1]^2 \to [0,1]$ be the true (possibly unobserved) graphon underlying a random graph model, and $w_1,\ldots,w_J$ be a collection of agent (candidate) graphons. Least-squares Graphon-BPS (Bayesian Predictive Synthesis at the graphon level) constructs the estimator as the $L^2$ -projection of $w_0$ onto the linear span of $\{1, w_1, \ldots, w_J\}$ , i.e.,

$w_{\mathrm{BPS}}(u,v) = \beta_0 + \sum_{j=1}^J \beta_j\, w_j(u,v)$

where the optimal coefficient vector $\beta^\star$ solves

$\beta^\star = \arg\min_{\beta \in \mathbb{R}^{J+1}} \int_{[0,1]^2} \bigl(w_0(u,v) - w_\beta(u,v)\bigr)^2\, du\, dv.$

In practice, one constructs an empirical Gram matrix and moment vector via

$\widehat{G}_m = \frac{1}{m} \sum_{s=1}^m F(X_s) F(X_s)^\top,\qquad \widehat{h}_m = \frac{1}{m} \sum_{s=1}^m F(X_s) Y_s$

where $F(u,v) = (1, w_1(u,v), \ldots, w_J(u,v))^\top$ , and $Y_s$ are i.i.d.\ edge indicators with $\Pr(Y_s=1 \mid X_s) = w_0(X_s)$ . The least-squares estimator is $\widehat{\beta}_m = \widehat{G}_m^{-1}\widehat{h}_m$ , yielding $\widehat{w}_m = w_{\widehat{\beta}_m}$ (Papamichalis et al., 21 Dec 2025).

This framework generalizes to bipartite and dynamic settings. In bipartite graphs, the mean matrix is approximated by minimizing the empirical squared error over block-constant matrices in two dimensions (Donier-Meroz et al., 2023). In the dynamic case, penalized least-squares is applied to a tensorized model with orthogonal time-basis expansion and block clustering in the spatial domain (Pensky, 2016).

2. Oracle Inequalities and Minimax Rates

Least-squares Graphon-BPS enjoys nonasymptotic oracle inequalities. For agent families of size $d=J+1$ , and $m$ i.i.d.\ edge samples, the prediction risk satisfies

$\mathbb{E}\left[\| \widehat{w}_m - w_0 \|_2^2\right] \leq 2\,\inf_{\|\beta\|\leq R} \|w_0 - w_\beta\|_2^2 + \frac{C d}{m}$

for some constant $C$ depending on feature bounds and the Gram matrix's spectral properties. If $w_0$ is itself in the agent span, the estimator achieves the minimax parametric rate $O(d/m)$ (Papamichalis et al., 21 Dec 2025).

For block-constant least-squares estimators approximating by $k$ -block models, the $L_2$ estimation error obeys

$\|\widehat{W}_{n,k} - W\|_2^2 = O_p\Big( E_k(W)^2 + \frac{1+\ln(1/k)}{K^2 n p_n} + \frac{\ln n}{K n} + \mathrm{tail}_2(W)^2 \Big)$

where $E_k(W)$ is the best $k$ -block approximation error and $\mathrm{tail}_2(W)$ measures heavy-tail truncation (Borgs et al., 2015).

For bipartite least-squares block estimators with $K\times L$ blocks, the integrated loss satisfies

$\delta(\widehat{W}^{LS}, W^*) \lesssim r_{n_1,n_2}(K,L) + \rho(K/n_1 + L/n_2)^{1/4}$

with $r_{n_1,n_2}(K,L)$ a complexity-remainder term and $\rho$ an upper bound for $|W^*|$ (Donier-Meroz et al., 2023).

A matching lower bound demonstrates minimax-optimality up to constants in each setting (Papamichalis et al., 21 Dec 2025, Klopp et al., 2015, Pensky, 2016).

3. Adaptivity, Heavy-tailed Graphons, and Sparsity

Least-squares Graphon-BPS naturally accommodates heavy-tailed, sparse regimes, and heterogeneous degree distributions. In the context of unbounded or integrable graphons, the estimation error accounts for truncation regions (where $p_n W > 1$ ), with $\mathrm{tail}_2(W)$ quantifying the excess $L_2$ mass.

Adaptivity arises because the method allows the number of blocks, penalty strength (in penalized LS), or agent span dimension to be selected in a data-driven way (cross-validation, penalized criteria, exponential weighting). When the true graphon is Hölder-continuous, approximation errors $E_k(W)$ scale as $O(k^{-\alpha})$ , guiding choice of $k$ in relation to network size and sparsity for minimax-optimal rates (Borgs et al., 2015, Donier-Meroz et al., 2023).

Mixtures and entropic tilting in the agent library do not destroy power-law degree behavior: the mixture's heavy-tail is dominated by the minimal tail exponent of the constituent agents; slow tilting preserves exponent, while polynomial tilting shifts the power-law exponent as predicted (Papamichalis et al., 21 Dec 2025).

4. Computational and Algorithmic Considerations

Exact global least-squares minimization is combinatorial and NP-hard, with solution space cardinality growing super-exponentially in $n$ or block counts. Practically, implementations utilize:

Spectral initializations: e.g., $k$ -means on the top eigenvectors of the adjacency or label matrices.
Block-coordinate descent / Lloyd's algorithm: Alternating minimization over block assignments and block averages, each with closed-form.
Semidefinite relaxations: For tighter convex surrogates.
Penalized selection: Penalized least-squares with model size or smoothness penalties, often guided by BIC-like rules or cross-validation.
Aggregation: Exponential weights allow ensemble or adaptively tuned combinations over block parameters or agent families (Donier-Meroz et al., 2023, Klopp et al., 2015, Papamichalis et al., 21 Dec 2025).

The per-iteration cost is often dominated by block sum computation and matrix updates; for bipartite LS $O(n_1 n_2 (K+L))$ per cycle (Donier-Meroz et al., 2023).

5. Extensions: Bipartite and Dynamic Graphon Models

The least-squares paradigm generalizes to bipartite and dynamic network data:

Bipartite Graphons: Block-constant LS estimators are constructed using two-sided clusterings $(Z, Z')$ and block matrix $Q$ . Finite sample bounds depend on best block partition error plus a complexity term $r_{n_1, n_2}(K, L)$ (Donier-Meroz et al., 2023).
Dynamic Graphons: Penalized least-squares is applied to adjacency tensors vectorized and transformed in the time dimension. Model selection is performed over block number $m$ and temporal truncation index $\rho$ via explicit penalty terms, yielding adaptive minimax rates under spatial and temporal smoothness (Pensky, 2016).

The resulting error bounds explicitly decouple spatial block approximation, temporal truncation bias, and estimation error, and hold uniformly over piecewise-constant, Hölder, and Sobolev graphon classes.

6. Structural Transfer and Network Properties

Lipschitz transfer inequalities relate graphon-level $L^2$ estimation error to errors in key network functionals:

Edge density error: $|e(w) - e(w')|\leq \|w-w'\|_2$
Degree distribution: $\|d_w - d_{w'}\|_2 \leq \|w - w'\|_2$
Triangle and wedge densities: $|t(w) - t(w')| \leq 3\|w-w'\|_2$ , $|s(w)-s(w')|\leq 2\|w-w'\|_2$
Clustering: $|C(w)-C(w')| \leq \frac{3}{s_0}\|w-w'\|_2 + \frac{2}{s_0^2}\|w-w'\|_2$ when wedge density $s_0>0$
Giant-component thresholds: Spectral radius inequalities transfer through to the combined estimator

This ensures that least-squares Graphon-BPS inherits and preserves key network structural characteristics (Papamichalis et al., 21 Dec 2025).

7. Practical Guidance and Significance

Practical implementation of least-squares Graphon-BPS involves:

Selecting a diverse agent library (ER, SBM, RDPG, ERGM, etc.)
Sampling a moderate number of edge dyads to form design matrices for the LS regression
Performing spectral or random initialization followed by block-coordinate or Lloyd minimization
Aggregating over block counts or agent combinations via exponential weights to avoid manual tuning
Regularizing in sparse regimes or when degree heterogeneity is extreme
Using transfer bounds to quantify the impact of estimation error on downstream quantities of interest

Least-squares Graphon-BPS formalizes an optimally adaptive, combination-beats-components phenomenon in network inference: linear combinations via squared-error minimization achieve provably lower risk than any individual agent or cluster configuration, particularly on convex hull subsets of candidate families. For both static and dynamic, dense or sparse, and even heavy-tailed networks, the method offers consistent, efficient recovery of latent graph structure and robust quantification of network functionals (Papamichalis et al., 21 Dec 2025, Borgs et al., 2015, Pensky, 2016, Donier-Meroz et al., 2023, Klopp et al., 2015).