Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive Model Selection via Complexity

Updated 27 January 2026
  • Complexity-driven adaptive model selection is a framework that balances approximation power and generalization error through penalty-based risk minimization.
  • It uses explicit complexity measures, such as parameter counts and metric entropy, to choose among models like tree tensor networks in high-dimensional settings.
  • Empirical techniques, including slope heuristic calibration, enable near oracle performance and achieve minimax rates across diverse function classes.

Complexity-driven adaptive model selection refers to the family of statistical and computational techniques that select, from among a hierarchy or sequence of models of varying structural complexity, the model that best balances approximation power and generalization error according to data-driven criteria. This framework, central in high-dimensional statistics, machine learning, and scientific computing, leverages explicit estimates of model complexity—such as parameter counts, metric entropy, or ranks of multilinear expansions—and employs penalized empirical risk formulations to adaptively select models. The penalty, often rooted in theoretical risk bounds, calibrates the trade-off between model fit and complexity, leading to adaptivity across a broad spectrum of function classes (e.g., Sobolev, Besov, or analytic classes). A canonical example is the adaptive selection of tree tensor networks by penalized empirical risk, minimizing excess prediction error over a collection of model classes parameterized by structural and representation complexity (Michel et al., 2020).

1. Structural Definition of Model Classes in Complexity-Driven Selection

A prototypical application of complexity-driven selection involves high-dimensional tensor-based model classes. Consider observations x=(x1,...,xd)X=X1××Xdx = (x_1, ..., x_d) \in X = X_1 \times \cdots \times X_d. For each variable ν\nu, a finite-dimensional feature space is specified,

Vν=span{ϕiνν:iν=1,,Nν}L2(Xν),V_\nu = \mathrm{span}\{\phi_{i_\nu}^\nu : i_\nu=1,\dots,N_\nu\} \subset L^2(X_\nu),

with ϕν(xν)RNν\phi^\nu(x_\nu) \in \mathbb{R}^{N_\nu}. The full tensor product feature space is

V=V1Vd,V = V_1 \otimes \cdots \otimes V_d,

with basis functions ϕi(x)=ϕi11(x1)ϕidd(xd)\phi_i(x) = \phi_{i_1}^1(x_1) \cdots \phi_{i_d}^d(x_d).

Tree tensor models introduce a dimension-partition tree TT over {1,,d}\{1,\ldots,d\}, whose nodes represent index splits and whose leaves correspond to single dimensions. For each node αT\alpha \in T, the α\alpha-rank of f:XRf: X \to \mathbb{R},

rankα(f)=min{r:f(x)=k=1rgkα(xα)hkαc(xαc)},\operatorname{rank}_\alpha(f) = \min\left\{r : f(x) = \sum_{k=1}^r g_k^\alpha(x_\alpha) h_k^{\alpha^c}(x_{\alpha^c})\right\},

controls the representational complexity of ff. The model class

MrT(V)={fV:rankα(f)rα αT}M_r^T(V) = \{ f \in V : \operatorname{rank}_\alpha(f) \leq r_\alpha \ \forall \alpha \in T \}

comprises functions admitting a tree tensor network parametrization with internal and leaf tensor cores. The total number of scalar model parameters ("representation complexity") is

C(T,r,V)=αI(T)rαβS(α)rβ+αL(T)rαNα.C(T, r, V) = \sum_{\alpha \in I(T)} r_\alpha \prod_{\beta \in S(\alpha)} r_\beta + \sum_{\alpha \in L(T)} r_\alpha N_\alpha.

Sparsity constraints are encoded by masking sets Λα\Lambda^\alpha on indices of vαv^\alpha.

2. Quantification of Model Class Complexity via Metric Entropy

Model class complexity is quantified by covering numbers or metric entropy. For tree tensor network classes, the entropy bound is

H(ϵ,MrT(V)R,Lp)C(T,r,V)log(3ϵ1RTLp),H\left(\epsilon, M_r^T(V)_R, \|\cdot\|_{L^p}\right) \leq C(T, r, V) \log\left(3 \epsilon^{-1} R |T| L_p\right),

where Lp=supv1RT,r,V(v)LpL_p = \sup_{\|\mathbf{v}\| \leq 1} \|R_{T,r,V}(\mathbf{v})\|_{L^p} and MrT(V)RM_r^T(V)_R is the radius-RR ball of functions in the model class. For sparse networks, C(T,r,V)C(T,r,V) is replaced by C(T,r,V,Λ)=αTΛαC(T, r, V, \Lambda) = \sum_{\alpha \in T} |\Lambda^\alpha|. This entropy characterization ensures that complexity penalties scale explicitly with the number of free parameters, aligning estimation risk with representation cost.

3. Penalized Empirical Risk Formulation

Complexity-adaptive model selection is achieved by minimizing a penalized empirical risk,

f^margminfMmR^n(f),m^argminmM{R^n(f^m)+pen(m)},\widehat{f}_m \in \arg\min_{f \in M_m} \widehat{R}_n(f), \qquad \hat{m} \in \arg\min_{m \in \mathcal{M}} \Big\{ \widehat{R}_n(\widehat{f}_m) + \operatorname{pen}(m) \Big\},

where R^n(f)=n1i=1nγ(f,Zi)\widehat{R}_n(f) = n^{-1} \sum_{i=1}^n \gamma(f, Z_i) is the empirical contrast (e.g., least squares, log-likelihood) and the penalty pen(m)\operatorname{pen}(m) is a function of complexity CmC_m.

Theoretical analysis prescribes penalty shapes:

  • General subgaussian contrasts: pen(m)λCm/n\operatorname{pen}(m) \sim \lambda \sqrt{C_m / n}
  • Bounded least squares: pen(m)λCm/n\operatorname{pen}(m) \sim \lambda C_m / n

A more precise formula incorporates problem-dependent constants:

pen(m)=K1R2[bmCmnε2log+ ⁣(nε2bmCm)+wˉCm+logNCmnε],\operatorname{pen}(m) = K_1 R^2 \Bigg[ \frac{b_m C_m}{n \varepsilon^2} \log^+\!\left(\frac{n \varepsilon^2}{b_m C_m}\right) + \frac{\bar{w} C_m + \log N_{C_m}}{n \varepsilon} \Bigg],

where bm=1+log+(3Tm/(4e)), NCm=#{m:Cm=Cm}b_m = 1 + \log^+(3|T_m|/(4e)), \ N_{C_m} = \#\{ m': C_{m'}=C_m\}. Calibration of λ\lambda is typically achieved via the slope heuristic.

4. Risk Bounds and Oracle Inequalities

Complexity-driven penalization leads to oracle inequalities:

  • For general bounded contrasts,

E[R(f^m)R(fm)]Cmlogn/n\mathbb{E}[ R(\widehat{f}_m) - R(f_m) ] \lesssim \sqrt{ C_m \log n / n }

  • For model selection,

E[R(f^m^)R(f)]infmM{R(fm)R(f)+pen(m)}+(smallresidual)\mathbb{E}[ R(\widehat{f}_{\hat{m}}) - R(f^*) ] \leq \inf_{m \in \mathcal{M}} \left\{ R(f_m) - R(f^*) + \operatorname{pen}(m) \right\} + (small\, residual)

  • For bounded least squares and adapted penalties,

E[f^m^f22]1+ε1εinfm{fmf22+K2pen(m)}+K3n\mathbb{E}[ \| \widehat{f}_{\hat{m}} - f^* \|_2^2 ] \leq \frac{1+\varepsilon}{1-\varepsilon} \inf_{m} \{ \|f_m - f^*\|_2^2 + K_2 \operatorname{pen}(m) \} + \frac{K_3}{n}

These bounds guarantee adaptivity: the procedure performs nearly as well as an oracle that would select the best model mm with knowledge of ff^*.

5. Adaptivity and Minimax Rates Over Smoothness Classes

The complexity-driven approach achieves (near) minimax adaptivity over a broad collection of function classes:

  • Isotropic Sobolev/Besov: minimax rate n2s/(2s+d)n^{-2s/(2s+d)} for Bqs(Lp),  p2B_q^s(L^p),\; p \geq 2 (with logarithmic slack),

Ef^m^f22n2s/(2s+d)log(n)\mathbb{E}\| \widehat{f}_{\hat{m}} - f^* \|_2^2 \lesssim n^{-2s/(2s+d)} \log(n)^{*}

  • Inhomogeneous Besov: only nonlinear estimators reach n2s/(2s+1)n^{-2s/(2s+1)}; sparse tensor networks adaptively attain this rate.
  • Mixed-dominated/anisotropic classes: minimax rate n2s(s)/(2s(s)+d)n^{-2s(\boldsymbol{s})/(2s(\boldsymbol{s})+d)}; sparse parametrizations remain optimal.
  • Analytic classes: approximation error decays exponentially in complexity; near-parametric rate n1n^{-1} up to logs.

Thus, properly constructed model collections and penalties enable data-driven procedures to recover minimax estimation rates without prior knowledge of the underlying smoothness or sparsity structure.

6. Slope Heuristic Calibration in Practice

The theoretical penalty up to a multiplicative constant is generally not computable in practice due to unknown problem constants. The slope heuristic provides a robust empirical approach: for a grid of penalty constants λ1<<λK\lambda_1 < \dots < \lambda_K, one computes the sequence of selected complexities CmC_{m_\ell}. The function λCm(λ)\lambda \mapsto C_{m(\lambda)} exhibits a distinctive drop at some λ^min\hat{\lambda}_{\min}; setting the penalty to twice this minimum (i.e., 2λ^minpenshape(m)2\hat{\lambda}_{\min} \mathrm{pen}_{\mathrm{shape}}(m)) yields empirically stable and theoretically motivated complexity selection (Michel et al., 2020). This approach avoids explicit data splitting and is robust across a range of sample sizes and signal-to-noise ratios.

7. Algorithmic Strategies and Empirical Validation

Algorithmic implementation entails:

  • Fixed-tree, rank-adaptive search: Iteratively incrementing tensor ranks in modes with maximal truncation error until the penalized risk criterion stabilizes.
  • Variable-tree search: Stochastic proposals for alternative trees (e.g., edge swaps), coupled with rank adaptation, expands the search space for optimal representational structures.

Numerical experiments demonstrate that the complexity-penalized estimator with slope-heuristic calibration selects nearly oracle-optimal model complexity Cm^C_{\hat{m}} and predictive risk R(f^m^)R(\widehat{f}_{\hat{m}}). Example applications include tensorized univariate function regression, high-dimensional synthetic benchmarks (e.g., 10D corner-peak, 8D borehole-flow), and show the estimator's performance as nearly matching the best possible model selected with oracle knowledge.

Summary Table: Complexity-Driven Model Selection for Tree Tensor Networks

Component Formalization/Result Example Reference
Model class MrT(V)VM_r^T(V) \subset V, tree tensor network (Michel et al., 2020)
Model complexity C(T,r,V)C(T, r, V), parameter count; or sparsity C(T,r,V,Λ)C(T, r, V, \Lambda) (Michel et al., 2020)
Metric entropy H(ϵ)C(T,r,V)log(3ϵ1RTLp)H(\epsilon) \leq C(T, r, V)\log(3 \epsilon^{-1} R |T| L_p) (Michel et al., 2020)
Penalty shape λCm/n\lambda \sqrt{C_m/n} (general); λCm/n\lambda C_m/n (least squares) (Michel et al., 2020)
Oracle inequality E[R(f^m^)R(f)]infm{}\mathbb{E}[R(\widehat f_{\hat m}) - R(f^*)] \leq \inf_{m} \{\cdots\} (Michel et al., 2020)
Adaptivity Rates near minimax across Sobolev/Besov/analytic classes (Michel et al., 2020)
Calibration heuristic Slope heuristic for penalty constant selection (Michel et al., 2020)

In conclusion, complexity-driven adaptive model selection—when grounded in explicit complexity measures, penalized empirical risk, and rigorously calibrated penalty constants—enables robust, theoretically-justified adaptivity over broad classes of high-dimensional models, including but not limited to tree tensor networks (Michel et al., 2020). The methodology aligns estimator risk with minimax rates, provided the structure of candidate models and penalties is consistent with underlying function class regularity and representation efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Complexity-Driven Adaptive Model Selection.