Papers
Topics
Authors
Recent
Search
2000 character limit reached

Structured Basis Function Networks (s-BFN)

Updated 8 May 2026
  • Structured Basis Function Networks (s-BFN) are a unified framework that integrates multi-hypothesis modeling, ensemble learning, and basis function approximations to enforce smoothness and diversity.
  • The paradigm leverages RBF feature maps and Bregman divergence for structured aggregation, enabling efficient closed-form and iterative learning methods.
  • s-BFN supports robust compression and uncertainty quantification, demonstrating superior performance in tasks ranging from regression to image classification.

Structured Basis Function Networks (s-BFN) synthesize geometric principles from multi-hypothesis modeling, ensemble learning, and basis function approximations into a unified framework. The s-BFN paradigm encompasses approaches ranging from continuous-in-depth parameterizations in neural ODEs to loss-centric multi-hypothesis ensembles with explicit diversity control. The key insight underlying all s-BFN models is the construction of a structured, basis-functionified space—where functional, statistical, or ensemble smoothness is induced for generalization, compression, or uncertainty quantification. This entry surveys the mathematical definitions, learning procedures, diversity mechanisms, theoretical characterizations, and empirical findings associated with s-BFN, as developed in both continuous-in-depth ODE architectures (Queiruga et al., 2021) and centroidal ensemble frameworks (Dominguez et al., 2 Sep 2025, Dominguez et al., 2023).

1. Mathematical Foundations and Formal Framework

The structured basis function network framework begins by defining a collection of MM base predictors, or hypotheses, hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x), each parameterized by θj\theta_j. For a given datum xix_i, these outputs are aggregated into a structured vector Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}. This vector forms the input domain for the basis function expansion and ensemble aggregation (Dominguez et al., 2 Sep 2025).

A central unifying component is the use of radial basis function (RBF) feature maps Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K (often with Gaussian kernels), parameterized by centers CkC_k and widths γk\gamma_k. The s-BFN output is then

y^i=G(Di;α,ϑ)=Φ(Di;ϑ)α,\hat{y}_i = G(D_i; \alpha, \vartheta) = \Phi(D_i; \vartheta)\alpha,

for learnable weights α∈RK×C\alpha\in\mathbb{R}^{K\times C} (with hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)0 in regression and hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)1 in classification).

The ensemble combiner is made consistent with the geometry of the loss through the choice of a Bregman divergence hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)2 (for strictly convex hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)3), yielding, in the canonical case, a weighted Bregman centroid:

hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)4

which specializes to either Euclidean means for squared loss or probability centroids for cross-entropy (Dominguez et al., 2 Sep 2025).

An alternative s-BFN instantiation, realized in continuous-depth neural ODEs, expresses each weight tensor hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)5 (for depth hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)6) as

hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)7

with basis functions hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)8 (e.g., piecewise-constant, piecewise-linear, or higher-order) and coefficient tensors hj(x)=fθj(x)h_j(x)=f_{\theta_j}(x)9. All θj\theta_j0 coefficients for all parameters are collected into a global parameter tensor, and the model is trained directly on this lower-dimensional, smooth parameter manifold (Queiruga et al., 2021).

2. Learning Algorithms: Closed-Form and Iterative Procedures

In the regression setting with squared loss, closed-form ridge regression can be applied at the ensemble level. Collect the RBF features into θj\theta_j1 for θj\theta_j2 samples. With regularization parameter θj\theta_j3, the aggregation weights are optimized by minimizing

θj\theta_j4

with analytic solution

θj\theta_j5

This closed-form estimator offers computational efficiency and convexity (Dominguez et al., 2 Sep 2025, Dominguez et al., 2023).

For general losses (e.g., cross-entropy) and in regimes where stochastic gradient optimization is required, s-BFN admits an end-to-end iterative algorithm. Each mini-batch proceeds by

  1. Forward-passing to obtain all base outputs and losses,
  2. Computing diversity-modulated update weights,
  3. Updating each base θj\theta_j6 with a diversity-weighted loss,
  4. Constructing the structured ensemble input,
  5. Calculating ensemble predictions and aggregate loss,
  6. Updating combiner weights θj\theta_j7 and kernel parameters θj\theta_j8 (Dominguez et al., 2 Sep 2025).

In the continuous-in-depth setting, basis function coefficients are learned for parameterized θj\theta_j9-dependence of weights and normalization statistics, with the integrator traversing xix_i0 during both forward and backward passes. Compression is realized by parameter-space projection: after training with a high-rank (xix_i1) basis, the parameters are projected or interpolated onto a reduced-rank (xix_i2) basis, minimizing the xix_i3 error between the high-rank and low-rank expansions with no need to revisit data (Queiruga et al., 2021).

3. Diversity Regulation and Centroidal Aggregation

s-BFN introduces a parametric diversity mechanism via a relaxation of winner-takes-all (WTA) assignment. On each xix_i4, let xix_i5. Update weights are specified by:

xix_i6

where xix_i7 controls the allocation of the loss signal. xix_i8 enforces pure WTA with maximal specialization (diversity) and potentially higher variance. As xix_i9 increases, the model interpolates towards uniform updates, reducing diversity and increasing bias. Empirically, optimal test error is achieved at intermediate Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}0 (Dominguez et al., 2 Sep 2025, Dominguez et al., 2023).

This formalizes and mitigates mode collapse in multi-hypothesis learning, ensuring that predictors meaningfully partition the label space (centroidal Voronoi tessellation) while avoiding convergence to degenerate, low-diversity solutions.

For basis function ODE-Nets, diversity takes the form of enforcing smooth weight evolution along the depth axis, leading to stable high-order integration and robust compression (Queiruga et al., 2021).

4. Theoretical Properties and Loss Geometry Alignment

The s-BFN ensemble combiner is theoretically supported via Bregman geometry. For any strictly convex Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}1, the prediction minimizes a sum of divergences to each base output, coinciding with the centroid in the geometry dictated by the loss. Theoretical results establish that

Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}2

gives the unique combiner under the induced loss (Dominguez et al., 2 Sep 2025).

A bias-variance-diversity decomposition under general losses is available:

Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}3

where the negative sign for diversity quantifies error cancellation across hypotheses. This yields a principled way to trade off accuracy against ensemble diversity.

Further, the PAC-Bayes C-bound for the majority vote links ensemble disagreement (diversity) to generalization error, underscoring s-BFN's relevance for stability and robust uncertainty quantification.

5. Empirical Evaluation and Practical Performance

Empirical validation spans both tabular regression and image classification. On regression benchmarks (Air Quality, Appliances Energy Prediction), s-BFN achieves lowest root mean squared error (RMSE) compared to SVM-RBF, random forest, gradient boosting, and arithmetic combiners. For Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}4 and Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}5, RMSE values of Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}6 and Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}7 were achieved on Air and Energy tasks, exceeding standard ensemble and single-predictor baselines (Dominguez et al., 2 Sep 2025, Dominguez et al., 2023).

For classification (MNIST, CIFAR-10), s-BFN improves accuracy over mean/logit averaging and mixture-of-experts, with the benefit amplified in heterogeneous ensembles and at moderate Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}8. On CIFAR-10, heterogeneous s-BFN ensembles reached accuracies up to Di=[h1(xi)⊤,…,hM(xi)⊤]⊤∈RdDD_i=[h_1(x_i)^\top,\ldots,h_M(x_i)^\top]^\top\in\mathbb{R}^{d_D}9, outperforming base models (Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K0) and other ensemble techniques.

In continuous-depth neural ODEs, s-BFN achieves Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K1 (CIFAR-10) and Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K2 (CIFAR-100) in image classification, matching or exceeding deep ResNet baselines. A posteriori compression via basis projection reduces parameters and inference time by Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K3 with minimal accuracy loss (Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K4 absolute) (Queiruga et al., 2021).

6. Structured Compression and Memory Efficiency

A salient practical property of s-BFN is its support for post-training compression. In the continuous depth context, projection from a high-rank to a lower-rank basis function expansion is performed entirely in parameter space using Gaussian quadrature and normal equations. No additional data or retraining is required; memory and compute are reduced (e.g., Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K5 space savings), while test accuracy drops by less than Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K6. This property is particularly advantageous for memory-constrained deployment (Queiruga et al., 2021).

For s-BFN in the ensemble context, the aggregation and prediction stage costs are dominated by the evaluation of Φ:RdD→RK\Phi:\mathbb{R}^{d_D}\to\mathbb{R}^K7 base models. The final RBF aggregation is significantly less expensive than full joint training and exhibits low variance across runs.

7. Extensions and Applications

The s-BFN paradigm is instantiated in diverse architectures:

  • Continuous-in-depth image classification and transformers for sequence tagging, employing neural ODE-blocks and basis expansions (Queiruga et al., 2021).
  • Structured multi-hypothesis regression ensembles with Voronoi partitioning and RBF aggregation for tabular prediction (Dominguez et al., 2023).
  • General-purpose, loss-centric, multi-hypothesis deep ensembles with controllable diversity for both regression and classification, incorporating both convex and stochastic training (Dominguez et al., 2 Sep 2025).

A key thread across these applications is the unification of generalized centroid-based aggregation, smooth/structured parameterization, closed-form and iterative learning, and explicit diversity–bias–variance trade-off mechanisms. This positions s-BFN as a robust framework for both predictive modeling and uncertainty quantification across a spectrum of data and model complexities.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Structured Basis Function Networks (s-BFN).