Cascade Polynomial Regression (CPR)
- CPR is a machine learning regression method that cascades polyharmonic spline regressors to uncover low-dimensional embeddings with global smoothness.
- It constructs each stage using polyharmonic spline kernels derived from the Laplacian, preserving invariance properties like translation and rotation.
- The architecture uses closed-form, GPU-friendly algorithms that reduce computational complexity, enabling scalability beyond traditional O(N^3) methods.
Cascade Polynomial Regression (CPR) is a machine learning regression methodology wherein a sequence of simple regressors is composed so that each stage operates on the outputs of the previous one, effectively uncovering low-dimensional embeddings prior to the application of nonlinearities. In the context of polyharmonic spline-based regression, the CPR paradigm underlies scalable architectures that maintain global smoothness, computational tractability, and are supported by precise symmetry-derived kernel constructions. Each regressor within the sequence, or cascade, is implemented by a package of polyharmonic splines, analytically derived as Green’s functions for the polyharmonic Laplacian, with closed-form, GPU-friendly algorithms for both forward computation and end-to-end differentiation, thereby achieving scalability and efficiency while preserving theoretical guarantees (Bakhvalov, 18 Dec 2025).
1. Polyharmonic Spline Kernels in Regression
The foundational element of CPR in this context is the polyharmonic spline kernel. For an input space and order , the polyharmonic spline is the Green’s function of the -fold Laplacian. Its radial form is given by
where
Here, is the dimension and . To ensure unique interpolation, the kernel includes an additional polynomial of degree less than . The formulation
with coefficients enforcing reproduction of polynomials up to degree 0, yields a positive-definite system. This kernel, uniquely specified by translation, rotation, and scale invariance (the principle of indifference), realizes the optimal regression solution under those symmetries (Bakhvalov, 18 Dec 2025).
2. Cascade Architecture: Composition of Spline Packages
To address the computational intractability of naive kernel regression (due to 1 cost for 2 data points) and the breakdown of symmetry assumptions in high dimensions, the CPR framework employs a cascade of spline packages. Each stage 3 operates on a subset of the input features or representations:
- At stage 4, select a small “constellation” of 5 centers 6.
- Construct 7 output functions:
8
- The matrix mapping at stage 9 is 0, where 1 is the 2 kernel matrix evaluated at the current batch against the constellation points, and 3 is the 4 coefficient matrix.
The full cascade is expressed as compositions:
5
where 6 are the raw inputs. Each package only inverts a 7 system once, keeping per-stage costs low for 8. This staged design breaks the 9 bottleneck and is theoretically justified in settings with unknown intrinsic low dimensionality (Bakhvalov, 18 Dec 2025).
3. Computational Complexity and Scalability
CPR achieves dramatic computational savings compared to classical kernel regression. In a single-stage kernel regression, training requires 0 time and 1 memory, with inference scaling as 2 for 3 new points. In a 4-stage cascade with per-stage constellation size 5, the costs become:
- Precomputing all inverses: 6.
- Forward evaluation on 7 points: 8.
- Backpropagation (for end-to-end gradients): 9.
- Memory per stage: 0.
With 1 (much smaller than 2 or more), each stage is compute- and memory-efficient—effectively making the overall complexity linear in 3 and enabling applications at scales previously inaccessible to kernel methods (Bakhvalov, 18 Dec 2025).
4. End-to-End Differentiation and Algorithmic Formulation
The architecture supports end-to-end training via exact, closed-form expressions for forward propagation and backpropagation of gradients, integrable into automatic differentiation frameworks. The critical primitives are:
- For forward propagation at each stage:
4
- For gradient propagation:
5
where the notations denote batch and constellation matrix operations, 6, 7, and 8. The Jacobian for each kernel 9 is computed by differentiating 0 as required. This matrix-friendly formulation enables highly parallel implementation and rapid execution on GPU accelerators (Bakhvalov, 18 Dec 2025).
5. Theoretical Guarantees and Symmetry Properties
The sequence of polyharmonic spline kernels employed in CPR is uniquely determined by the principle of indifference—translation, rotation, and scale invariance—assuming a Gaussian measure on function space. The resulting reproducing kernel is 1 (plus polynomial), which provides optimal regression under these invariance symmetries. By composing these kernels in a cascade, the architecture maintains global invariance properties. However, the composition operation itself breaks Gaussianity, analogous to “deep Gaussian processes,” allowing the model to capture multimodal features and adapt to functions supported on lower-dimensional manifolds (unknown intrinsic dimension). Hence, CPR is theoretically justified as a method for scalable, smooth regression in high-dimensional or manifold-structured settings (Bakhvalov, 18 Dec 2025).
6. Comparison to Classical CPR and Distinguishing Features
While CPR, in the sense of sequentially composed regressors, shares conceptual ancestry with earlier cascade regression techniques, the polyharmonic spline architecture introduces several sharp distinctions:
- The basis functions (polyharmonic splines) are derived analytically from symmetry and indifference principles, rather than being chosen heuristically.
- Each layer’s mapping is globally smooth by exact kernel evaluation, distinguishing it from methods that rely on local fits or truncated approximations.
- The framework yields full closed-form, GPU-friendly formulas for both forward computation and backpropagation, without requiring truncation, sampling, or stochastic estimation. As a plausible implication, this theoretically-grounded construction enables discovery of meaningful low-dimensional structure in data, prior to subsequent nonlinearity, in a manner not possible with heuristically chosen basis sets (Bakhvalov, 18 Dec 2025).
7. Application Domains
CPR architectures based on polyharmonic spline packages have broad applicability wherever scalable, smooth function approximation is required. Suitable domains include:
- Large-scale spatial interpolation for geostatistics and remote sensing,
- Surrogate modeling of physical simulators,
- High-dimensional regression with unknown manifold structure,
- Time-series forecasting with learned embeddings. Any context necessitating both global smoothness and escape from the 2 bottleneck found in classical kernel regression is a candidate for this methodology (Bakhvalov, 18 Dec 2025).