Asymptotic Optimism for Tensor Regression Models with Applications to Neural Network Compression

Published 27 Mar 2026 in stat.ML, cs.LG, and math.ST | (2603.26048v1)

Abstract: We study rank selection for low-rank tensor regression under random covariates design. Under a Gaussian random-design model and some mild conditions, we derive population expressions for the expected training-testing discrepancy (optimism) for both CP and Tucker decomposition. We further demonstrate that the optimism is minimized at the true tensor rank for both CP and Tucker regression. This yields a prediction-oriented rank-selection rule that aligns with cross-validation and extends naturally to tensor-model averaging. We also discuss conditions under which under- or over-ranked models may appear preferable, thereby clarifying the scope of the method. Finally, we showcase its practical utility on a real-world image regression task and extend its application to tensor-based compression of neural network, highlighting its potential for model selection in deep learning.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper’s main contribution is the derivation of closed-form optimism measures that quantify the training-testing error gap in tensor regression models.
By leveraging CP and Tucker decompositions, it shows that model optimism is minimized at the true rank, outperforming traditional AIC/BIC criteria.
Extensive numerical experiments and real-world applications, such as image regression and neural network compression, validate the proposed rank selection strategy.

Asymptotic Optimism in Tensor Regression: Theory, Model Selection, and Neural Network Compression

Introduction and Motivation

The paper "Asymptotic Optimism for Tensor Regression Models with Applications to Neural Network Compression" (2603.26048) rigorously addresses the challenge of model selection in high-dimensional tensor regression, focusing on the scalar-on-tensor setting with random-design covariates. Conventional model selection criteria—AIC, BIC, and Mallows's Cp—depend on in-sample error and parameter count, which fail to reliably capture predictive power and complexity for tensor-based models, particularly under random design and low-rank decompositions. The authors advance the theoretical understanding by deriving closed-form expressions for the expected optimism (i.e., the training-testing error discrepancy) for tensor regression models using CP and Tucker decompositions. Notably, they demonstrate that the expected optimism is minimized at the true tensor rank, establishing a principled, prediction-oriented rank-selection rule aligned with cross-validation and naturally extendable to ensemble approaches and neural network compression. The paper further delineates the boundaries and failure modes of conventional criteria, validates theory via extensive numerical experiments, and applies the findings to practical image regression and tensorized neural network compression.

Theoretical Framework: Optimism under Random Design

Revisiting Optimism

Optimism quantifies the gap between training and test performance, serving as an unbiased estimator of generalization error when covariates are random and statistically independent between training and testing (Random-X regime). Standard "Fixed-X" approaches (where test covariates exactly match those of training) are misaligned with real predictive tasks. The authors leverage recent advances in optimism theory—specifically, the work of Luan et al. and Luo & Zhu—which provide rigorous population-level formulas for excess risk in random-design linear and kernel ridge regression.

Tensor Regression via Kernel Ridge Equivalence

Both CP and Tucker decomposed tensor regressors admit equivalence to kernel ridge regression (KRR) with specific multilinear kernels [yu2018tensor]. The mapping facilitates direct derivation of optimism expressions by analytic kernel feature spectra, rather than naive parameter counting. The key innovation is that the low-rank structure is encoded into the kernel spectrum, enabling more precise complexity quantification and generalization error assessment.

Main Results: Population Optimism for CP and Tucker Regression

Rank Optimality

For both CP and Tucker decompositions, the expected optimism is expressed as a sum over the kernel spectrum (eigenvalues), parameterized by the target rank. Under mild regularization and moderate noise, the expected optimism is minimized when the target rank matches the true rank—i.e., neither under- nor over-estimating rank leads to reduced generalization error.

Figure 1: Average optimism of tensor KRR model for varying CP ranks and noise levels, confirming the minimum at the true rank.

The theoretical results are:

True rank case: The optimism is a function of the kernel eigenvalues and regularization, and is minimized at the true rank.
Over-ranked models: Optimism is strictly higher; additional components add complexity but do not improve fit.
Under-ranked models: Optimism rises due to approximation error, unless noise dominates, which is theoretically and empirically validated.
Figure 2: Average optimism curves for CP and Tucker regression with varying ranks and sample sizes, confirming the minimum at the true rank in practical estimation settings.

Explicit Formulas

The closed-form expressions for expected optimism (see equations (3.15), (3.17), (3.21) in the paper) directly link optimism to spectral properties of CP/Tucker kernel matrices and regularization magnitude. The approximation terms are shown to be negligible except in extreme noise or shrinkage regimes.

Model-Averaged and Ensemble Settings

Extending to ensembles and model averaging, the paper proves the optimism of an ensemble-averaged estimator is strictly upper-bounded by the weighted average of component optimisms. Thus, ensemble approaches (including TRMA [bu2025improving]) further stabilize generalization error and yield robust rank selection across heterogeneous data settings.

Figure 3: Ensemble CP regression optimism versus arithmetic mean of individual components, confirming strict upper-boundedness.

Numerical and Real-World Evidence

Simulation Studies

Oracle and realistic experiments validate theory: optimism is minimized at true rank and scales $\mathcal{O}(n^{-1})$ and $\mathcal{O}(\sigma^2)$ . Empirical results confirm analytic formulas up to high-dimensional, finite-sample effects.

Image Regression and Model Selection

The FGNET facial age dataset is analyzed via CP regression. AIC and BIC fail to identify optimal ranks; optimism aligns predictive risk to true generalization, selecting models with lowest test error.

Figure 4: Optimism, AIC, and BIC tendencies versus CP rank for image regression; only optimism tracks minimum test MSE accurately.

Neural Network Compression

The optimism framework is extended to tensor-structured neural network layers (CNN and MLP), guiding rank selection for CP-compressed convolutional layers. Optimism, not AIC/BIC, selects architectures with minimal test error, even in over-parameterized regimes. Low-rank compression occasionally yields better generalization than uncompressed networks, consistent with findings in recent literature on benign overfitting in deep models.

Figure 5: Optimism versus CP rank for compressed CNN layers under various criteria; optimism selects configurations yielding minimal test error.

Figure 6: Optimism-driven rank selection in MLP compression yields simple models with optimal generalization; test error remains low across all ranks.

Implications and Future Directions

Theoretical Contributions

The paper establishes a rigorous complexity measure for tensor models via kernel ridge equivalence, overcoming deficiencies of parameter counting. The prediction-oriented optimism provides a unified criterion for model selection that is theoretically and empirically robust across random-design settings, tensor decompositions, and neural network architectures. Rank selection by minimizing optimism is shown to be consistent and resistant to misspecification errors and regularization pathologies, given noise boundedness and moderate sample sizes.

Practical and Algorithmic Impact

Optimism-minimizing rank selection is preferable in tensor regression and tensorized deep learning. It delivers reliable balancing of complexity and generalization, outperforming classical criteria (AIC/BIC) especially when standard notions of "effective parameters" are ill-defined or misleading, as in neural networks and high-dimensional tensors. The direct connection to cross-validation and ensemble averaging extends applicability to adaptive and heterogeneous modeling.

Future Directions

Extensions to higher-order tensor decompositions (e.g., Tensor-Train), tree-based ensemble models, and nonlinear architectures are a natural progression. Investigating optimism-driven regularization in deep over-parameterized models and exploring connections to overfitting theory (benign overfitting, neural tangent kernels) remain open research avenues. The findings further suggest optimism-based selection as a robust mechanism for automated rank adaptation in contemporary machine learning pipelines.

Conclusion

This work advances the theory and practice of model selection in tensor regression and tensorized neural networks, providing analytic optimism formulas that rigorously link rank to generalization error. The results demonstrate that minimum optimism coincides with true rank, supporting prediction-oriented selection rules fundamentally superior to classical parameter-count approaches. Extensive empirical and real-data evidence confirm theoretical claims and highlight implications for practical neural network compression, ensemble learning, and robust model selection in high-dimensional structured prediction.

Markdown Report Issue