Adaptive Model Selection via Complexity

Updated 27 January 2026

Complexity-driven adaptive model selection is a framework that balances approximation power and generalization error through penalty-based risk minimization.
It uses explicit complexity measures, such as parameter counts and metric entropy, to choose among models like tree tensor networks in high-dimensional settings.
Empirical techniques, including slope heuristic calibration, enable near oracle performance and achieve minimax rates across diverse function classes.

Complexity-driven adaptive model selection refers to the family of statistical and computational techniques that select, from among a hierarchy or sequence of models of varying structural complexity, the model that best balances approximation power and generalization error according to data-driven criteria. This framework, central in high-dimensional statistics, machine learning, and scientific computing, leverages explicit estimates of model complexity—such as parameter counts, metric entropy, or ranks of multilinear expansions—and employs penalized empirical risk formulations to adaptively select models. The penalty, often rooted in theoretical risk bounds, calibrates the trade-off between model fit and complexity, leading to adaptivity across a broad spectrum of function classes (e.g., Sobolev, Besov, or analytic classes). A canonical example is the adaptive selection of tree tensor networks by penalized empirical risk, minimizing excess prediction error over a collection of model classes parameterized by structural and representation complexity (Michel et al., 2020).

1. Structural Definition of Model Classes in Complexity-Driven Selection

A prototypical application of complexity-driven selection involves high-dimensional tensor-based model classes. Consider observations $x = (x_1, ..., x_d) \in X = X_1 \times \cdots \times X_d$ . For each variable $\nu$ , a finite-dimensional feature space is specified,

$V_\nu = \mathrm{span}\{\phi_{i_\nu}^\nu : i_\nu=1,\dots,N_\nu\} \subset L^2(X_\nu),$

with $\phi^\nu(x_\nu) \in \mathbb{R}^{N_\nu}$ . The full tensor product feature space is

$V = V_1 \otimes \cdots \otimes V_d,$

with basis functions $\phi_i(x) = \phi_{i_1}^1(x_1) \cdots \phi_{i_d}^d(x_d)$ .

Tree tensor models introduce a dimension-partition tree $T$ over $\{1,\ldots,d\}$ , whose nodes represent index splits and whose leaves correspond to single dimensions. For each node $\alpha \in T$ , the $\alpha$ -rank of $f: X \to \mathbb{R}$ ,

$\operatorname{rank}_\alpha(f) = \min\left\{r : f(x) = \sum_{k=1}^r g_k^\alpha(x_\alpha) h_k^{\alpha^c}(x_{\alpha^c})\right\},$

controls the representational complexity of $f$ . The model class

$M_r^T(V) = \{ f \in V : \operatorname{rank}_\alpha(f) \leq r_\alpha \ \forall \alpha \in T \}$

comprises functions admitting a tree tensor network parametrization with internal and leaf tensor cores. The total number of scalar model parameters ("representation complexity") is

$C(T, r, V) = \sum_{\alpha \in I(T)} r_\alpha \prod_{\beta \in S(\alpha)} r_\beta + \sum_{\alpha \in L(T)} r_\alpha N_\alpha.$

Sparsity constraints are encoded by masking sets $\Lambda^\alpha$ on indices of $v^\alpha$ .

2. Quantification of Model Class Complexity via Metric Entropy

Model class complexity is quantified by covering numbers or metric entropy. For tree tensor network classes, the entropy bound is

$H\left(\epsilon, M_r^T(V)_R, \|\cdot\|_{L^p}\right) \leq C(T, r, V) \log\left(3 \epsilon^{-1} R |T| L_p\right),$

where $L_p = \sup_{\|\mathbf{v}\| \leq 1} \|R_{T,r,V}(\mathbf{v})\|_{L^p}$ and $M_r^T(V)_R$ is the radius- $R$ ball of functions in the model class. For sparse networks, $C(T,r,V)$ is replaced by $C(T, r, V, \Lambda) = \sum_{\alpha \in T} |\Lambda^\alpha|$ . This entropy characterization ensures that complexity penalties scale explicitly with the number of free parameters, aligning estimation risk with representation cost.

3. Penalized Empirical Risk Formulation

Complexity-adaptive model selection is achieved by minimizing a penalized empirical risk,

$\widehat{f}_m \in \arg\min_{f \in M_m} \widehat{R}_n(f), \qquad \hat{m} \in \arg\min_{m \in \mathcal{M}} \Big\{ \widehat{R}_n(\widehat{f}_m) + \operatorname{pen}(m) \Big\},$

where $\widehat{R}_n(f) = n^{-1} \sum_{i=1}^n \gamma(f, Z_i)$ is the empirical contrast (e.g., least squares, log-likelihood) and the penalty $\operatorname{pen}(m)$ is a function of complexity $C_m$ .

Theoretical analysis prescribes penalty shapes:

General subgaussian contrasts: $\operatorname{pen}(m) \sim \lambda \sqrt{C_m / n}$
Bounded least squares: $\operatorname{pen}(m) \sim \lambda C_m / n$

A more precise formula incorporates problem-dependent constants:

$\operatorname{pen}(m) = K_1 R^2 \Bigg[ \frac{b_m C_m}{n \varepsilon^2} \log^+\!\left(\frac{n \varepsilon^2}{b_m C_m}\right) + \frac{\bar{w} C_m + \log N_{C_m}}{n \varepsilon} \Bigg],$

where $b_m = 1 + \log^+(3|T_m|/(4e)), \ N_{C_m} = \#\{ m': C_{m'}=C_m\}$ . Calibration of $\lambda$ is typically achieved via the slope heuristic.

4. Risk Bounds and Oracle Inequalities

Complexity-driven penalization leads to oracle inequalities:

For general bounded contrasts,

$\mathbb{E}[ R(\widehat{f}_m) - R(f_m) ] \lesssim \sqrt{ C_m \log n / n }$

For model selection,

$\mathbb{E}[ R(\widehat{f}_{\hat{m}}) - R(f^*) ] \leq \inf_{m \in \mathcal{M}} \left\{ R(f_m) - R(f^*) + \operatorname{pen}(m) \right\} + (small\, residual)$

For bounded least squares and adapted penalties,

$\mathbb{E}[ \| \widehat{f}_{\hat{m}} - f^* \|_2^2 ] \leq \frac{1+\varepsilon}{1-\varepsilon} \inf_{m} \{ \|f_m - f^*\|_2^2 + K_2 \operatorname{pen}(m) \} + \frac{K_3}{n}$

These bounds guarantee adaptivity: the procedure performs nearly as well as an oracle that would select the best model $m$ with knowledge of $f^*$ .

5. Adaptivity and Minimax Rates Over Smoothness Classes

The complexity-driven approach achieves (near) minimax adaptivity over a broad collection of function classes:

Isotropic Sobolev/Besov: minimax rate $n^{-2s/(2s+d)}$ for $B_q^s(L^p),\; p \geq 2$ (with logarithmic slack),

$\mathbb{E}\| \widehat{f}_{\hat{m}} - f^* \|_2^2 \lesssim n^{-2s/(2s+d)} \log(n)^{*}$

Inhomogeneous Besov: only nonlinear estimators reach $n^{-2s/(2s+1)}$ ; sparse tensor networks adaptively attain this rate.
Mixed-dominated/anisotropic classes: minimax rate $n^{-2s(\boldsymbol{s})/(2s(\boldsymbol{s})+d)}$ ; sparse parametrizations remain optimal.
Analytic classes: approximation error decays exponentially in complexity; near-parametric rate $n^{-1}$ up to logs.

Thus, properly constructed model collections and penalties enable data-driven procedures to recover minimax estimation rates without prior knowledge of the underlying smoothness or sparsity structure.

6. Slope Heuristic Calibration in Practice

The theoretical penalty up to a multiplicative constant is generally not computable in practice due to unknown problem constants. The slope heuristic provides a robust empirical approach: for a grid of penalty constants $\lambda_1 < \dots < \lambda_K$ , one computes the sequence of selected complexities $C_{m_\ell}$ . The function $\lambda \mapsto C_{m(\lambda)}$ exhibits a distinctive drop at some $\hat{\lambda}_{\min}$ ; setting the penalty to twice this minimum (i.e., $2\hat{\lambda}_{\min} \mathrm{pen}_{\mathrm{shape}}(m)$ ) yields empirically stable and theoretically motivated complexity selection (Michel et al., 2020). This approach avoids explicit data splitting and is robust across a range of sample sizes and signal-to-noise ratios.

7. Algorithmic Strategies and Empirical Validation

Algorithmic implementation entails:

Fixed-tree, rank-adaptive search: Iteratively incrementing tensor ranks in modes with maximal truncation error until the penalized risk criterion stabilizes.
Variable-tree search: Stochastic proposals for alternative trees (e.g., edge swaps), coupled with rank adaptation, expands the search space for optimal representational structures.

Numerical experiments demonstrate that the complexity-penalized estimator with slope-heuristic calibration selects nearly oracle-optimal model complexity $C_{\hat{m}}$ and predictive risk $R(\widehat{f}_{\hat{m}})$ . Example applications include tensorized univariate function regression, high-dimensional synthetic benchmarks (e.g., 10D corner-peak, 8D borehole-flow), and show the estimator's performance as nearly matching the best possible model selected with oracle knowledge.

Summary Table: Complexity-Driven Model Selection for Tree Tensor Networks

Component	Formalization/Result Example	Reference
Model class	$M_r^T(V) \subset V$ , tree tensor network	(Michel et al., 2020)
Model complexity	$C(T, r, V)$ , parameter count; or sparsity $C(T, r, V, \Lambda)$	(Michel et al., 2020)
Metric entropy	$H(\epsilon) \leq C(T, r, V)\log(3 \epsilon^{-1} R \|T\| L_p)$	(Michel et al., 2020)
Penalty shape	$\lambda \sqrt{C_m/n}$ (general); $\lambda C_m/n$ (least squares)	(Michel et al., 2020)
Oracle inequality	$\mathbb{E}[R(\widehat f_{\hat m}) - R(f^*)] \leq \inf_{m} \{\cdots\}$	(Michel et al., 2020)
Adaptivity	Rates near minimax across Sobolev/Besov/analytic classes	(Michel et al., 2020)
Calibration heuristic	Slope heuristic for penalty constant selection	(Michel et al., 2020)

In conclusion, complexity-driven adaptive model selection—when grounded in explicit complexity measures, penalized empirical risk, and rigorously calibrated penalty constants—enables robust, theoretically-justified adaptivity over broad classes of high-dimensional models, including but not limited to tree tensor networks (Michel et al., 2020). The methodology aligns estimator risk with minimax rates, provided the structure of candidate models and penalties is consistent with underlying function class regularity and representation efficiency.

Markdown Upgrade to Chat

References (1)

Learning with tree tensor networks: complexity estimates and model selection (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Complexity-Driven Adaptive Model Selection.

Adaptive Model Selection via Complexity

1. Structural Definition of Model Classes in Complexity-Driven Selection

2. Quantification of Model Class Complexity via Metric Entropy

3. Penalized Empirical Risk Formulation

4. Risk Bounds and Oracle Inequalities

5. Adaptivity and Minimax Rates Over Smoothness Classes

6. Slope Heuristic Calibration in Practice

7. Algorithmic Strategies and Empirical Validation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Adaptive Model Selection via Complexity

1. Structural Definition of Model Classes in Complexity-Driven Selection

2. Quantification of Model Class Complexity via Metric Entropy

3. Penalized Empirical Risk Formulation

4. Risk Bounds and Oracle Inequalities

5. Adaptivity and Minimax Rates Over Smoothness Classes

6. Slope Heuristic Calibration in Practice

7. Algorithmic Strategies and Empirical Validation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research