Adaptive Model Selection via Complexity
- Complexity-driven adaptive model selection is a framework that balances approximation power and generalization error through penalty-based risk minimization.
- It uses explicit complexity measures, such as parameter counts and metric entropy, to choose among models like tree tensor networks in high-dimensional settings.
- Empirical techniques, including slope heuristic calibration, enable near oracle performance and achieve minimax rates across diverse function classes.
Complexity-driven adaptive model selection refers to the family of statistical and computational techniques that select, from among a hierarchy or sequence of models of varying structural complexity, the model that best balances approximation power and generalization error according to data-driven criteria. This framework, central in high-dimensional statistics, machine learning, and scientific computing, leverages explicit estimates of model complexity—such as parameter counts, metric entropy, or ranks of multilinear expansions—and employs penalized empirical risk formulations to adaptively select models. The penalty, often rooted in theoretical risk bounds, calibrates the trade-off between model fit and complexity, leading to adaptivity across a broad spectrum of function classes (e.g., Sobolev, Besov, or analytic classes). A canonical example is the adaptive selection of tree tensor networks by penalized empirical risk, minimizing excess prediction error over a collection of model classes parameterized by structural and representation complexity (Michel et al., 2020).
1. Structural Definition of Model Classes in Complexity-Driven Selection
A prototypical application of complexity-driven selection involves high-dimensional tensor-based model classes. Consider observations . For each variable , a finite-dimensional feature space is specified,
with . The full tensor product feature space is
with basis functions .
Tree tensor models introduce a dimension-partition tree over , whose nodes represent index splits and whose leaves correspond to single dimensions. For each node , the -rank of ,
controls the representational complexity of . The model class
comprises functions admitting a tree tensor network parametrization with internal and leaf tensor cores. The total number of scalar model parameters ("representation complexity") is
Sparsity constraints are encoded by masking sets on indices of .
2. Quantification of Model Class Complexity via Metric Entropy
Model class complexity is quantified by covering numbers or metric entropy. For tree tensor network classes, the entropy bound is
where and is the radius- ball of functions in the model class. For sparse networks, is replaced by . This entropy characterization ensures that complexity penalties scale explicitly with the number of free parameters, aligning estimation risk with representation cost.
3. Penalized Empirical Risk Formulation
Complexity-adaptive model selection is achieved by minimizing a penalized empirical risk,
where is the empirical contrast (e.g., least squares, log-likelihood) and the penalty is a function of complexity .
Theoretical analysis prescribes penalty shapes:
- General subgaussian contrasts:
- Bounded least squares:
A more precise formula incorporates problem-dependent constants:
where . Calibration of is typically achieved via the slope heuristic.
4. Risk Bounds and Oracle Inequalities
Complexity-driven penalization leads to oracle inequalities:
- For general bounded contrasts,
- For model selection,
- For bounded least squares and adapted penalties,
These bounds guarantee adaptivity: the procedure performs nearly as well as an oracle that would select the best model with knowledge of .
5. Adaptivity and Minimax Rates Over Smoothness Classes
The complexity-driven approach achieves (near) minimax adaptivity over a broad collection of function classes:
- Isotropic Sobolev/Besov: minimax rate for (with logarithmic slack),
- Inhomogeneous Besov: only nonlinear estimators reach ; sparse tensor networks adaptively attain this rate.
- Mixed-dominated/anisotropic classes: minimax rate ; sparse parametrizations remain optimal.
- Analytic classes: approximation error decays exponentially in complexity; near-parametric rate up to logs.
Thus, properly constructed model collections and penalties enable data-driven procedures to recover minimax estimation rates without prior knowledge of the underlying smoothness or sparsity structure.
6. Slope Heuristic Calibration in Practice
The theoretical penalty up to a multiplicative constant is generally not computable in practice due to unknown problem constants. The slope heuristic provides a robust empirical approach: for a grid of penalty constants , one computes the sequence of selected complexities . The function exhibits a distinctive drop at some ; setting the penalty to twice this minimum (i.e., ) yields empirically stable and theoretically motivated complexity selection (Michel et al., 2020). This approach avoids explicit data splitting and is robust across a range of sample sizes and signal-to-noise ratios.
7. Algorithmic Strategies and Empirical Validation
Algorithmic implementation entails:
- Fixed-tree, rank-adaptive search: Iteratively incrementing tensor ranks in modes with maximal truncation error until the penalized risk criterion stabilizes.
- Variable-tree search: Stochastic proposals for alternative trees (e.g., edge swaps), coupled with rank adaptation, expands the search space for optimal representational structures.
Numerical experiments demonstrate that the complexity-penalized estimator with slope-heuristic calibration selects nearly oracle-optimal model complexity and predictive risk . Example applications include tensorized univariate function regression, high-dimensional synthetic benchmarks (e.g., 10D corner-peak, 8D borehole-flow), and show the estimator's performance as nearly matching the best possible model selected with oracle knowledge.
Summary Table: Complexity-Driven Model Selection for Tree Tensor Networks
| Component | Formalization/Result Example | Reference |
|---|---|---|
| Model class | , tree tensor network | (Michel et al., 2020) |
| Model complexity | , parameter count; or sparsity | (Michel et al., 2020) |
| Metric entropy | (Michel et al., 2020) | |
| Penalty shape | (general); (least squares) | (Michel et al., 2020) |
| Oracle inequality | (Michel et al., 2020) | |
| Adaptivity | Rates near minimax across Sobolev/Besov/analytic classes | (Michel et al., 2020) |
| Calibration heuristic | Slope heuristic for penalty constant selection | (Michel et al., 2020) |
In conclusion, complexity-driven adaptive model selection—when grounded in explicit complexity measures, penalized empirical risk, and rigorously calibrated penalty constants—enables robust, theoretically-justified adaptivity over broad classes of high-dimensional models, including but not limited to tree tensor networks (Michel et al., 2020). The methodology aligns estimator risk with minimax rates, provided the structure of candidate models and penalties is consistent with underlying function class regularity and representation efficiency.