Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cascade Polynomial Regression (CPR)

Updated 7 May 2026
  • CPR is a machine learning regression method that cascades polyharmonic spline regressors to uncover low-dimensional embeddings with global smoothness.
  • It constructs each stage using polyharmonic spline kernels derived from the Laplacian, preserving invariance properties like translation and rotation.
  • The architecture uses closed-form, GPU-friendly algorithms that reduce computational complexity, enabling scalability beyond traditional O(N^3) methods.

Cascade Polynomial Regression (CPR) is a machine learning regression methodology wherein a sequence of simple regressors is composed so that each stage operates on the outputs of the previous one, effectively uncovering low-dimensional embeddings prior to the application of nonlinearities. In the context of polyharmonic spline-based regression, the CPR paradigm underlies scalable architectures that maintain global smoothness, computational tractability, and are supported by precise symmetry-derived kernel constructions. Each regressor within the sequence, or cascade, is implemented by a package of polyharmonic splines, analytically derived as Green’s functions for the polyharmonic Laplacian, with closed-form, GPU-friendly algorithms for both forward computation and end-to-end differentiation, thereby achieving scalability and efficiency while preserving theoretical guarantees (Bakhvalov, 18 Dec 2025).

1. Polyharmonic Spline Kernels in Regression

The foundational element of CPR in this context is the polyharmonic spline kernel. For an input space Rn\mathbb{R}^n and order mNm \in \mathbb{N}, the polyharmonic spline is the Green’s function of the mm-fold Laplacian. Its radial form is given by

km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),

where

ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}

Here, dd is the dimension and r=xxr = \|x - x'\|. To ensure unique interpolation, the kernel includes an additional polynomial of degree less than mm. The formulation

kf(t)=t2mdlnt+α<maαtα,k_f(t) = |t|^{2m-d} \ln|t| + \sum_{|\alpha|<m}a_\alpha t^\alpha,

with coefficients aαa_\alpha enforcing reproduction of polynomials up to degree mNm \in \mathbb{N}0, yields a positive-definite system. This kernel, uniquely specified by translation, rotation, and scale invariance (the principle of indifference), realizes the optimal regression solution under those symmetries (Bakhvalov, 18 Dec 2025).

2. Cascade Architecture: Composition of Spline Packages

To address the computational intractability of naive kernel regression (due to mNm \in \mathbb{N}1 cost for mNm \in \mathbb{N}2 data points) and the breakdown of symmetry assumptions in high dimensions, the CPR framework employs a cascade of spline packages. Each stage mNm \in \mathbb{N}3 operates on a subset of the input features or representations:

  • At stage mNm \in \mathbb{N}4, select a small “constellation” of mNm \in \mathbb{N}5 centers mNm \in \mathbb{N}6.
  • Construct mNm \in \mathbb{N}7 output functions:

mNm \in \mathbb{N}8

  • The matrix mapping at stage mNm \in \mathbb{N}9 is mm0, where mm1 is the mm2 kernel matrix evaluated at the current batch against the constellation points, and mm3 is the mm4 coefficient matrix.

The full cascade is expressed as compositions:

mm5

where mm6 are the raw inputs. Each package only inverts a mm7 system once, keeping per-stage costs low for mm8. This staged design breaks the mm9 bottleneck and is theoretically justified in settings with unknown intrinsic low dimensionality (Bakhvalov, 18 Dec 2025).

3. Computational Complexity and Scalability

CPR achieves dramatic computational savings compared to classical kernel regression. In a single-stage kernel regression, training requires km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),0 time and km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),1 memory, with inference scaling as km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),2 for km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),3 new points. In a km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),4-stage cascade with per-stage constellation size km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),5, the costs become:

  • Precomputing all inverses: km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),6.
  • Forward evaluation on km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),7 points: km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),8.
  • Backpropagation (for end-to-end gradients): km(x,x)=ϕ(xx),k_m(x, x') = \phi(\|x - x'\|),9.
  • Memory per stage: ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}0.

With ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}1 (much smaller than ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}2 or more), each stage is compute- and memory-efficient—effectively making the overall complexity linear in ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}3 and enabling applications at scales previously inaccessible to kernel methods (Bakhvalov, 18 Dec 2025).

4. End-to-End Differentiation and Algorithmic Formulation

The architecture supports end-to-end training via exact, closed-form expressions for forward propagation and backpropagation of gradients, integrable into automatic differentiation frameworks. The critical primitives are:

  • For forward propagation at each stage:

ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}4

  • For gradient propagation:

ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}5

where the notations denote batch and constellation matrix operations, ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}6, ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}7, and ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}8. The Jacobian for each kernel ϕ(r)={r2mdlnr,2md even, r2md,2md odd.\phi(r) = \begin{cases} r^{2m-d} \ln r, & 2m-d \text{ even}, \ r^{2m-d}, & 2m-d \text{ odd}. \end{cases}9 is computed by differentiating dd0 as required. This matrix-friendly formulation enables highly parallel implementation and rapid execution on GPU accelerators (Bakhvalov, 18 Dec 2025).

5. Theoretical Guarantees and Symmetry Properties

The sequence of polyharmonic spline kernels employed in CPR is uniquely determined by the principle of indifference—translation, rotation, and scale invariance—assuming a Gaussian measure on function space. The resulting reproducing kernel is dd1 (plus polynomial), which provides optimal regression under these invariance symmetries. By composing these kernels in a cascade, the architecture maintains global invariance properties. However, the composition operation itself breaks Gaussianity, analogous to “deep Gaussian processes,” allowing the model to capture multimodal features and adapt to functions supported on lower-dimensional manifolds (unknown intrinsic dimension). Hence, CPR is theoretically justified as a method for scalable, smooth regression in high-dimensional or manifold-structured settings (Bakhvalov, 18 Dec 2025).

6. Comparison to Classical CPR and Distinguishing Features

While CPR, in the sense of sequentially composed regressors, shares conceptual ancestry with earlier cascade regression techniques, the polyharmonic spline architecture introduces several sharp distinctions:

  • The basis functions (polyharmonic splines) are derived analytically from symmetry and indifference principles, rather than being chosen heuristically.
  • Each layer’s mapping is globally smooth by exact kernel evaluation, distinguishing it from methods that rely on local fits or truncated approximations.
  • The framework yields full closed-form, GPU-friendly formulas for both forward computation and backpropagation, without requiring truncation, sampling, or stochastic estimation. As a plausible implication, this theoretically-grounded construction enables discovery of meaningful low-dimensional structure in data, prior to subsequent nonlinearity, in a manner not possible with heuristically chosen basis sets (Bakhvalov, 18 Dec 2025).

7. Application Domains

CPR architectures based on polyharmonic spline packages have broad applicability wherever scalable, smooth function approximation is required. Suitable domains include:

  • Large-scale spatial interpolation for geostatistics and remote sensing,
  • Surrogate modeling of physical simulators,
  • High-dimensional regression with unknown manifold structure,
  • Time-series forecasting with learned embeddings. Any context necessitating both global smoothness and escape from the dd2 bottleneck found in classical kernel regression is a candidate for this methodology (Bakhvalov, 18 Dec 2025).
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cascade Polynomial Regression (CPR).