Papers
Topics
Authors
Recent
2000 character limit reached

Per-dimension Mean & Variance Correction

Updated 14 December 2025
  • Per-dimension mean and variance correction is a statistical approach that adjusts each feature’s estimates to account for heteroscedasticity in high-dimensional data.
  • Empirical Bayes and NPMLE frameworks implement shrinkage estimation, improving inference accuracy and enhancing classifier performance.
  • These methodologies yield robust estimators that substantially reduce misclassification errors in high-dimensional discriminant analysis applications.

Per-dimension mean and variance correction refers to statistical methodologies that estimate and adjust both the mean and variance for each coordinate (feature, or dimension) in high-dimensional data analysis. These corrections are crucial for robust inference, especially under heteroscedasticity, where the variances are not uniform across coordinates. Such approaches substantially improve empirical Bayes and linear discriminant procedures by adapting to the featurewise variability intrinsic to contemporary high-dimensional datasets.

1. Statistical Models and Motivation

The high-dimensional normal model underpins most per-dimension mean and variance correction methods. Given pp independent coordinates, for each jj there exist nn replicates: Xij=μj+σjεij,εijN(0,1)X_{ij} = \mu_j + \sigma_j\,\varepsilon_{ij}, \qquad \varepsilon_{ij} \sim N(0,1) where i=1,,ni=1,\dots,n and j=1,,pj=1,\dots,p. For each jj, the sample mean Xˉj\bar X_j and sample variance Sj2S_j^2 satisfy: XˉjN(μj,σj2/n),(n1)Sj2/σj2χn12\bar X_j\sim N(\mu_j, \sigma_j^2/n), \qquad (n-1)S_j^2/\sigma_j^2 \sim \chi^2_{n-1} with Xˉj\bar X_j and Sj2S_j^2 independent. The observed variance σj2\sigma_j^2 is often unknown and variable across jj, demanding per-coordinate inference. Heteroscedasticity is intrinsic to many bioinformatics, genomics, and imaging applications, where ignoring featurewise variance differences can lead to severe degradation of downstream tasks, such as classification error in high-dimensional discriminant analysis (Oh et al., 17 Jan 2024, Sinha et al., 2018).

2. Empirical Bayes and Mixture-model Frameworks

Empirical Bayes approaches exploit prior models on (μj,σj2)(\mu_j, \sigma_j^2), facilitating shrinkage estimation that adapts to per-dimension variability.

A widely used framework, as described by Sinha & Hart (Sinha et al., 2018), posits a finite mixture of Normal–Inverse-Gamma (N-IG) priors: p(μj,σj2)=k=1KπkN(μjmk,σj2/λk)InvGamma(σj2αk,βk)p(\mu_j, \sigma_j^2) = \sum_{k=1}^K \pi_k\,N(\mu_j \mid m_k, \sigma_j^2/\lambda_k)\,\mathrm{InvGamma}(\sigma_j^2 \mid \alpha_k, \beta_k) Posterior inference for each coordinate combines the likelihood from observed data with the N-IG priors. The marginal (mixture) posterior for (μj,σj2)(\mu_j, \sigma_j^2) is: p(μj,σj2Xˉj,Sj2)=k=1Kwjkp(μj,σj2Xˉj,Sj2,k)p(\mu_j, \sigma_j^2 \mid \bar X_j, S_j^2) = \sum_{k=1}^K w_{jk}\,p(\mu_j, \sigma_j^2 \mid \bar X_j, S_j^2, k) where wjkw_{jk} are component-specific responsibilities, computed via Bayes' rule using the model’s parameters.

The per-dimension posterior means for μj\mu_j and σj2\sigma_j^2—the "mean and variance corrections"—are: μ^j=k=1KwjknXˉj+λkmkn+λk\hat\mu_j = \sum_{k=1}^K w_{jk}\,\frac{n\,\bar X_j + \lambda_k m_k}{n+\lambda_k}

σ^j2=k=1Kwjkβk+12[(n1)Sj2+nλkn+λk(Xˉjmk)2]αk+(n+1)/2\hat\sigma_j^2 = \sum_{k=1}^K w_{jk}\,\frac{\beta_k + \frac{1}{2}[(n-1)S_j^2 + \frac{n\lambda_k}{n+\lambda_k}(\bar X_j-m_k)^2]}{\alpha_k + (n+1)/2}

The hyperparameters {πk,mk,λk,αk,βk}k=1K\{\pi_k, m_k, \lambda_k, \alpha_k, \beta_k\}_{k=1}^K are estimated via expectation–maximization (EM) or variational Bayes. This yields shrinkage estimates of means and variances, adapting to both local and global heterogeneity (Sinha et al., 2018).

3. Nonparametric Maximum Likelihood Estimation (NPMLE) Approaches

NPMLE offers an alternative to parametric mixtures by estimating the mixing distribution itself from the data, without restrictive parametric forms (Oh et al., 17 Jan 2024). For high-dimensional classification tasks—such as Fisher’s Linear Discriminant Analysis (LDA) under variable feature variances—the NPMLE framework treats each coordinate’s (μj,σj2)(\mu_j, \sigma_j^2) as random draws from unknown distributions G0G_0 (for means) and F0F_0 (for variances).

For two-class models, within each coordinate jj: Xj=Xˉj(1)Xˉj(2)N(μj,n1+n2n1n2σj2)X_j = \bar X_j^{(1)} - \bar X_j^{(2)} \sim N\Bigl(\mu_j, \frac{n_1 + n_2}{n_1 n_2} \sigma_j^2\Bigr)

Vj=pooled variance summary,(n1+n22)Vjσj2χn1+n222V_j = \text{pooled variance summary}, \quad \frac{(n_1 + n_2 - 2)V_j}{\sigma_j^2} \sim \chi^2_{n_1 + n_2 - 2}

NPMLEs F^0\widehat F_0 and G^0\widehat G_0 are computed by maximizing the empirical log-likelihoods, employing grid approximations for discretization: F^0(σ2)=k=1Kw1k1{vkσ2},G^0(μ)=l=1Lw2l1{ulμ}\widehat F_0(\sigma^2) = \sum_{k=1}^K w_{1k} 1\{v_k \leq \sigma^2\}, \quad \widehat G_0(\mu) = \sum_{l=1}^L w_{2l} 1\{u_l \leq \mu\} The coordinatewise mean and variance corrections from empirical Bayes are: μ^j=l=1Lulw2lk=1Kw1kfX,V(Xj,Vjul,vk)l=1Lw2lk=1Kw1kfX,V(Xj,Vjul,vk)\widehat\mu_j = \frac{\sum_{l=1}^L u_l w_{2l} \sum_{k=1}^K w_{1k} f_{X,V}(X_j, V_j \mid u_l, v_k)}{\sum_{l=1}^L w_{2l} \sum_{k=1}^K w_{1k} f_{X,V}(X_j, V_j \mid u_l, v_k)}

σ^j2=k=1Kvkw1kfV(Vjvk)k=1Kw1kfV(Vjvk)\widehat\sigma_j^2 = \frac{\sum_{k=1}^K v_k\, w_{1k} f_V(V_j \mid v_k)}{\sum_{k=1}^K w_{1k} f_V(V_j \mid v_k)}

These corrections directly inform the construction of a "Mean-and-Variance Adaptive" (MVA) classifier, yielding substantial improvements in high-dimensional discriminant analysis in the presence of variance heterogeneity (Oh et al., 17 Jan 2024).

4. Applications in High-dimensional Discriminant Analysis

Mean and variance corrections are central to the adaptation of Fisher’s LDA and related classifiers for heteroscedastic high-dimensional data. Under the diagonal covariance assumption ("independent rule"), the optimal linear discriminant is: δI(x)=(xμ(1)+μ(2)2)TΣ1(μ(1)μ(2))=j=1pajxj+a0\delta_I(x) = (x - \frac{\mu^{(1)} + \mu^{(2)}}{2})^T \Sigma^{-1}(\mu^{(1)} - \mu^{(2)}) = \sum_{j=1}^p a_j x_j + a_0 with aj=μj/σj2a_j = \mu_j/\sigma_j^2. Substituting empirical Bayes or NPMLE estimates for each aja_j leads to the MVA rule: a^j=μ^jσ^j2\hat a_j = \frac{\widehat\mu_j}{\widehat\sigma_j^2}

δMVA(x)=j=1pa^jxj+a^0logπ^2π^1\delta_{MVA}(x) = \sum_{j=1}^p \hat a_j x_j + \hat a_0 - \log\frac{\hat\pi_2}{\hat\pi_1}

This classifier accounts for both mean shift and local variance scaling per-feature. Theoretical analysis shows that, under sparsity or dense effect regimes, and provided mild regularity, the MVA rule’s classification risk converges to the Bayes risk and outperforms classical homoscedastic LDA when true variances are heterogeneous. Empirical validation on synthetic and real data demonstrates misclassification rates <1%<1\% for MVA when rivals such as standard LDA, Naive Bayes, NSC, SURE, Park-NPMLE, FAIR, PLDA incur 530%5–30\% errors under strong heteroscedasticity (Oh et al., 17 Jan 2024).

5. Theoretical Properties and Consistency

Empirical Bayes procedures based on finite N-IG mixtures or NPMLE formulations enjoy strong consistency guarantees under classical mixture-model asymptotics. For instance, the Kiefer–Wolfowitz theory ensures that, as pp \to \infty, the NPMLEs F^0F0\widehat F_0 \to F_0 and G^0G0\widehat G_0 \to G_0 almost surely, rendering the corrected empirical Bayes estimates μ^j,σ^j2\widehat\mu_j, \widehat\sigma_j^2 consistent for the true coordinate parameters (Oh et al., 17 Jan 2024).

In the context of multivariate Chebyshev inequalities, no per-dimension version is derived; all results remain in the full Mahalanobis distance form, which incorporates correlations between coordinates. The bounds on tail probabilities using empirically estimated mean and covariance matrices converge, as NN \to \infty, to their theoretical population counterparts (Stellato et al., 2015).

6. Comparative Performance and Empirical Validation

Empirical studies confirm the efficacy of per-dimension mean and variance correction:

  • Under various synthetic heteroscedasticity regimes (left-skewed, right-skewed, symmetric), MVA classifiers achieve classification errors <1%<1\%, outperforming methods imposing parametric assumptions on the variance distribution.
  • On real-world datasets (Breast Cancer, Huntington’s, Leukemia, CNS embryonal tumors, pp ranging from $1,000$ to $11,000$), MVA consistently matches or improves upon benchmark classifier risks in leave-one-out cross validation (Oh et al., 17 Jan 2024).

A key empirical finding is that mixture-based, nonparametric variance shrinkage adapts to complex, multimodal, or skewed variance patterns, where fixed-form inverse-gamma shrinkage (e.g., SURE, Park) fails to capture the true featurewise structure.


In summary, per-dimension mean and variance correction synthesizes hierarchical, empirical Bayes, and nonparametric methods to yield adaptive, consistent, and empirically robust estimators in high-dimensional inference, with particular applicability in heteroscedastic classification and related domains (Sinha et al., 2018, Oh et al., 17 Jan 2024).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Per-dimension Mean and Variance Correction.