Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variable Basis Mapping (VBM)

Updated 21 January 2026
  • VBM is a penalized basis learning methodology that extends sparse multiclass LDA by incorporating per-variable ordinal weights to ensure order-concordant variable selection.
  • It employs a two-step Kendall’s Tau procedure to construct ordinal weights, screening noise and enforcing monotonicity in group means for reliable variable selection.
  • An efficient block-coordinate descent algorithm optimizes the VBM objective, yielding interpretable, sparse discriminative bases even when p greatly exceeds N.

Variable Basis Mapping (VBM) refers to a penalized basis learning methodology for high-dimensional ordinal classification problems. The VBM framework, as developed in Kim et al. (2024), extends sparse multiclass linear discriminant analysis (LDA) by introducing per-variable ordinal weights and a weighted group-lasso penalty, thereby enabling the selection of variables that exhibit both discriminative and order-concordant behavior with respect to the ordinal response. VBM is designed for regimes with high-dimensional feature spaces (pNp \gg N), where interpretability and variable selection are critical.

1. Formulation of the Ordinal-Weighted Sparse Basis Learning Problem

Let XRpX \in \mathbb{R}^p denote a feature vector and y{1,,K}y \in \{1,\ldots,K\} the ordinal class label. The VBM method assumes a common-covariance Gaussian model: X(y=g)N(μg,Σ),g=1,,K.X \mid (y = g) \sim N(\mu_g, \Sigma), \quad g = 1, \dots, K. Standard multiclass LDA seeks a (K1)(K-1)-dimensional basis ZRp×(K1)Z \in \mathbb{R}^{p \times (K-1)} that maximizes separation between groups. The unpenalized estimator is: Ψ=Σ1M=argminZRp×(K1)tr(12ZTΣZZTM),\Psi = \Sigma^{-1} M = \arg\min_{Z \in \mathbb{R}^{p \times (K-1)}} \operatorname{tr}\left(\tfrac{1}{2} Z^T \Sigma Z - Z^T M\right), where Σ\Sigma is the (pooled) within-group covariance and MM determines the between-group means.

To promote sparsity and preferentially select order-concordant variables, VBM introduces the following penalized objective with per-variable ordinal weights wjw_j: Z^η,λord=argminZRp×(K1)tr(12ZTΣ^ZZTM^)+j=1pλη1wjZj,:2,\widehat Z^{\mathrm{ord}}_{\eta,\lambda} = \arg\min_{Z \in \mathbb{R}^{p \times (K-1)}} \operatorname{tr}\left(\tfrac{1}{2} Z^T \widehat\Sigma Z - Z^T \widehat M\right) + \sum_{j=1}^p \lambda \eta^{1-w_j} \|Z_{j,:}\|_2, where λ>0\lambda > 0 sets overall sparsity and η1\eta \ge 1 amplifies penalization for variables with smaller wjw_j (Kim et al., 2022).

2. Construction of Ordinal Weights via Two-Step Kendall’s Tau Procedure

Variable selection is guided by ordinal weights wj[0,1]w_j \in [0,1], constructed via a two-step process leveraging Kendall’s tau statistics:

  • Global Kendall’s Tau (τ^j\hat{\tau}_j): Measures correlation between XijX_{ij} and yiy_i across all samples.

τ^j=2N(N1)1i<kNsgn(XkjXij)sgn(ykyi)\hat{\tau}_j = \frac{2}{N(N-1)} \sum_{1 \leq i < k \leq N} \operatorname{sgn}(X_{kj} - X_{ij}) \operatorname{sgn}(y_k - y_i)

  • Group-Mean Kendall’s Tau (τ~j\tilde{\tau}_j): Assesses monotonicity of group means.

τ~j=2K(K1)1g<hKsgn(μ^j(h)μ^j(g))\tilde{\tau}_j = \frac{2}{K(K-1)} \sum_{1 \leq g < h \leq K} \operatorname{sgn}(\hat{\mu}_j^{(h)} - \hat{\mu}_j^{(g)})

For thresholds 0<θ1,θ2<10 < \theta_1, \theta_2 < 1, variable jj is assigned

wj={1,if τ^j>θ1 and τ~j>1θ2 0,otherwisew_j = \begin{cases} 1, & \text{if } |\hat{\tau}_j| > \theta_1 \text{ and } |\tilde{\tau}_j| > 1-\theta_2 \ 0, & \text{otherwise} \end{cases}

Step 1 eliminates “noise” variables; Step 2 detects variables whose class means are strictly monotone. Theorems guarantee that this rule selects true order-concordant variables with high probability under mild assumptions (Kim et al., 2022).

3. Optimization via Block-Coordinate Descent

The minimization of the VBM objective is efficiently solved by block-coordinate descent, exploiting the group-lasso structure. For each row jj in ZZ, the algorithm updates as follows:

  1. Compute the partial-residual vector:

aj=M^j,:kjΣ^jkZk,:(t)a_j = \widehat M_{j,:} - \sum_{k \neq j} \widehat\Sigma_{jk} Z^{(t)}_{k,:}

  1. Set λj=λη1wj\lambda_j = \lambda \eta^{1-w_j} and σjj=Σ^jj\sigma_{jj} = \widehat\Sigma_{jj}.
  2. Update row jj:

Zj,:(t+1)=1σjj(1λjaj2)+ajZ^{(t+1)}_{j,:} = \frac{1}{\sigma_{jj}} \left(1 - \frac{\lambda_j}{\|a_j\|_2}\right)_+ a_j

The procedure converges to the global optimum due to the convexity and block-separability of the objective.

4. Theoretical Guarantees in High-Dimensional Regimes

VBM exhibits non-asymptotic oracle properties under high-dimensional scaling. Key sets include:

  • Discriminant variables Jdisc={j:Ψj,:0}J_{\text{disc}} = \{j : \Psi_{j,:} \neq 0\}
  • Ordinal variables Jord={j:μj1μjK}J_{\text{ord}} = \{j : \mu_j^1 \leq \cdots \leq \mu_j^K\}
  • Ordinal-discriminant Jdiscord=JdiscJordJ^{\text{ord}}_{\text{disc}} = J_{\text{disc}} \cap J_{\text{ord}}

Selection consistency is achieved when tuning parameters (λ,η)(\lambda, \eta) are chosen appropriately:

  • For moderate η\eta, all discriminative variables are selected.
  • For large η\eta, only variables that are both discriminative and order-concordant are selected.

Estimation bounds (in ,2\ell_{\infty,2}) are provided: Z^η,λordΨ,2=O(ϕηλ)\|\widehat Z^{\mathrm{ord}}_{\eta,\lambda} - \Psi\|_{\infty,2} = O(\phi \eta \lambda) in probability, where ϕ\phi is a compatibility constant. High-dimensional consistency requires log(pd)d2/N0\log(pd) d^2 / N \rightarrow 0, λ0\lambda \to 0 sufficiently slowly, and ηλ\eta\lambda \to \infty (Kim et al., 2022).

5. Post-Screening and Data-Adaptive Refinement

In practical applications, data-adaptive thresholding is deployed. The two-step weights can be refined by:

  • Initial screening using ANOVA F-tests to screen noise from variables with nontrivial mean differences.
  • Adaptive selection of θ1\theta_1 and θ2\theta_2 based on empirical distributions of τ^j\hat{\tau}_j.
  • This adaptive procedure maintains strict separation between order-concordant and non-monotone variables.

6. Interpretability and Sparsity of the Learned Representation

The group-lasso penalty in VBM leads to row-sparsity in the learned basis: only a small subset of variables contributes to the (K1)(K-1)-dimensional discriminant subspace. Variables with monotonic class means under yy incur less regularization (λj=λ\lambda_j = \lambda) and are preferentially retained when η>1\eta > 1, while non-monotone or noisy features are heavily penalized and typically excluded.

Each selected variable corresponds to a row in ZZ and can be directly mapped to interpretable patterns of monotone group-mean shifts in the projected subspace. This facilitates intelligible variable selection, particularly useful in domains such as genomics, where the interpretability of the selected genes is paramount.

Practical results include:

  • In low-dimensional synthetic settings, VBM recovers the true set JdiscordJ^{\text{ord}}_{\text{disc}} under suitable η\eta.
  • In large-scale gene expression datasets, VBM selects a highly sparse subset (typically 7–20 out of >10,000 genes) while maintaining competitive or superior classification error rates compared to nominal LDA or ordinal logistic regression (Kim et al., 2022).
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variable Basis Mapping (VBM).