Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Hierarchical Folded Normal Model

Updated 3 November 2025
  • Hierarchical Folded Normal Model is a complex probabilistic framework that embeds the folded normal distribution—obtained by taking the absolute value of a normal variate—into multilevel models.
  • The model features boundary coercivity and a unique profile likelihood maximizer, ensuring stable parameter estimation even under nonregular likelihood conditions.
  • Rigorous results establish Hausdorff consistency, nonstandard asymptotics (including n^(1/4) convergence for μ = 0), and practical implications for hierarchical and penalized inference.

The hierarchical folded normal model pertains to probabilistic models where the folded normal distribution is embedded as a prior, likelihood, or data-generating component within a broader, typically multilevel, statistical structure. The folded normal distribution arises by taking the absolute value of a normal variate, yielding a nonregular model with challenging likelihood geometry—features that propagate into hierarchical extensions. The most recent theoretical treatment establishes rigorous likelihood properties, full identification results, nonstandard asymptotics, and robust estimation principles for such models, addressing key boundary and uniform convergence issues (Mallik, 25 Aug 2025).

1. Folded Normal Distribution and Likelihood Structure

The folded normal distribution is defined as the distribution of Y=XY = |X| where XN(μ,σ2)X \sim N(\mu, \sigma^2). Its density for y0y \geq 0 is

f(y;μ,σ)=1σ{ϕ(yμσ)+ϕ(y+μσ)}=1σ2πexp(y2+μ22σ2)2cosh(yμσ2)f(y; \mu, \sigma) = \frac{1}{\sigma}\left\{ \phi\left(\frac{y - \mu}{\sigma}\right) + \phi\left(\frac{y + \mu}{\sigma}\right) \right\} = \frac{1}{\sigma\sqrt{2\pi}} \exp\left( - \frac{y^2 + \mu^2}{2\sigma^2} \right) 2\cosh\left( \frac{y\mu}{\sigma^2} \right)

where ϕ\phi is the standard normal density. When incorporated as part of a (possibly hierarchical) stochastic model, the log-likelihood for observed y1,,yny_1, \ldots, y_n is: n(μ,σ)=i=1nlogf(yi;μ,σ)\ell_n(\mu, \sigma) = \sum_{i=1}^n \log f(y_i; \mu, \sigma) This likelihood is even in μ\mu, resulting in non-identifiability of the sign and intricate geometry, especially in hierarchical constructions where µ and σ may themselves depend on latent or hyperparameters.

2. Likelihood Geometry: Boundary Coercivity and Maximizers

The folded normal log-likelihood exhibits boundary coercivity: as σ0\sigma \to 0 or σ\sigma \to \infty, n(μ,σ)\ell_n(\mu, \sigma) \to -\infty, provided the data exhibit nonzero sample variance. Explicitly, for any σ(0,1]\sigma \in (0,1]: supμn(μ,σ)nlogσns22σ2+C0\sup_{\mu} \ell_n(\mu, \sigma) \leq -n \log \sigma - \frac{n s^2}{2 \sigma^2} + C_0 where s2s^2 is the sample variance of the data. This property ensures that, within a hierarchical or penalized framework, likelihood maximization is not compromised by non-informative extrema at the domain boundaries.

For profile likelihood optimization, for each fixed σ>0\sigma > 0, the function μn(μ,σ)\mu \mapsto \ell_n(\mu, \sigma) has a unique maximizer μ^(σ)\hat{\mu}(\sigma), determined by the fixed-point equation: i=1nyitanh(yiμ/σ2)=nμ\sum_{i=1}^n y_i \tanh(y_i \mu / \sigma^2) = n\mu The solution path μ^(σ)\hat{\mu}(\sigma) is strictly decreasing and C1C^1 in σ: μ^(σ)=2μ^A/σnA/σ2<0\hat{\mu}'(\sigma) = -\frac{2 \hat{\mu} A/\sigma}{ n - A/\sigma^2 } < 0 with A=iyi2sech2(yiμ^/σ2)A = \sum_i y_i^2 \,\mathrm{sech}^2(y_i\hat{\mu} / \sigma^2). The profile likelihood in σ\sigma, n,p(σ)=n(μ^(σ),σ)\ell_{n,p}(\sigma) = \ell_n(\hat{\mu}(\sigma), \sigma), is strictly unimodal, with exactly one maximizer in (0,)(0, \infty).

3. Identification, Consistency, and Asymptotic Rates

Let θ0=(μ0,σ0)\theta_0 = (\mu_0, \sigma_0) denote the true parameters, with the parameter space identifiable only up to sign, Θ0={(μ0,σ0),(μ0,σ0)}\Theta_0 = \{ (\mu_0, \sigma_0), (-\mu_0, \sigma_0) \}. The estimation procedures yield the following guarantees:

  • Hausdorff Consistency: The set of maximizers {(±μ^n,σ^n)}\{ (\pm\hat{\mu}_n, \hat{\sigma}_n) \} converges in the Hausdorff metric to the true set Θ0\Theta_0:

dH({(±μ^n,σ^n)},  Θ0)p0d_H\left( \{ (\pm\hat\mu_n, \hat\sigma_n)\},\; \Theta_0 \right) \to_p 0

  • Kullback-Leibler Separation: The expected log-likelihood is strictly maximized on Θ0\Theta_0, and population KL divergence is strictly positive elsewhere.

Asymptotic Distribution:

Property μ00\mu_0 \neq 0 (Regular) μ0=0\mu_0 = 0 (Nonregular)
MLE convergence rate n1/2n^{1/2} n1/4n^{1/4}
Limiting law Gaussian Argmax of quadratic-minus-quartic
Fisher information Nonsingular Singular in μ-direction
Confidence set geometry Elliptical (Wald/LR) Highly non-elliptical; Wald fails
Inference method Classical (Wald/LR) Nonstandard (LR, subsampling, etc.)

In the regular case (μ00)(\mu_0 \neq 0), standard normal asymptotics apply. In the nonregular case (μ0=0)(\mu_0 = 0), the Fisher information matrix is singular for μ, resulting in an estimator with convergence rate n1/4n^{-1/4} and a limiting law described as an argmax of a quadratic-minus-quartic contrast: n(μ,σ0)n(0,σ0)μ22σ04(Yi2σ02)μ412σ08Yi4\ell_n(\mu, \sigma_0) - \ell_n(0, \sigma_0) \approx \frac{\mu^2}{2\sigma_0^4} \sum (Y_i^2 - \sigma_0^2) - \frac{\mu^4}{12\sigma_0^8} \sum Y_i^4

4. Uniform Laws, Envelope Bounds, and Finite Sample Control

The uniform law of large numbers (ULLN) for the folded normal log-likelihood is established via explicit envelope functions on the log-likelihood, score, and Hessian. This approach avoids covering number or entropy-based machinery and supports finite-sample deviation bounds. The principle is that, for any parameter set bounded away from identification sets, the probability of uniform deviations exceeding specified constants decreases rapidly with sample size.

Envelope bounds grant practical error control for both inference and hierarchical extensions, since parameter spaces can be large or growing with model complexity.

5. Implications for Hierarchical and Penalized Models

The properties established in the non-hierarchical case extend to hierarchical or multilevel models incorporating folded normal components. Key consequences:

  • Well-Behaved Posterior and Penalized Likelihoods: Boundary coercivity and unimodal profiling ensure posterior or penalized estimation procedures avoid pathological maxima.
  • Explicit Control in Empirical Bayes: Envelope bounds and uniform laws allow robust calculation of marginal likelihoods and empirical Bayes estimates, accounting for nonregularity, especially in "null signal" regimes.
  • Penalty Design: Boundary coercivity persists under suitable penalty functions. For mixture models and complex hierarchies, adding quadratic penalties in location and log-scale secures estimator existence and stability since the penalty dominates possible variance-collapse spikes.

A plausible implication is that sample size–dependent penalty tuning can be explicitly calculated for consistency: when the penalty shrinks slowly enough that sample size multiplied by penalty diverges, penalized estimators retain consistency.

6. Estimation Procedures and Practical Inference

  • MLE Computation: Profile likelihood in folded normal models has a unique maximizer; Newton-Raphson or similar algorithms are stable under mild starting conditions.
  • Confidence Interval Construction: For μ0\mu_0 near zero, naive Wald intervals are inaccurate due to the one-fourth rate and non-Gaussian limits. Likelihood-ratio or subsampling-based intervals are preferred.
  • Variance Estimation: Inference for σ\sigma is regular, but joint confidence regions with μ are highly non-elliptical in near-symmetry regimes.
  • Hierarchical/Bayesian Implementation: The explicit bounds and path regularity undergird robust prior and hyperparameter selection within hierarchical folded normal frameworks, especially for partially or wholly non-identifiable settings.

7. Summary Table and Key Formulas

Property Regular (μ00\mu_0 \neq 0) Nonregular (μ0=0\mu_0 = 0)
MLE rate n1/2n^{1/2} n1/4n^{1/4} in μ\mu
Limiting law Gaussian Argmax quadratic-minus-quartic
Fisher information Nonsingular Singular in μ-direction
Inference strategy Wald/LR Nonstandard (not Wald)
Boundary behavior Coercive Coercive

Key formulas:

  • Folded normal density:

f(y;μ,σ)=1σ{ϕ(yμσ)+ϕ(y+μσ)}f(y;\mu,\sigma) = \frac{1}{\sigma}\left\{ \phi\left(\frac{y-\mu}{\sigma}\right) + \phi\left(\frac{y+\mu}{\sigma}\right) \right\}

  • Profile path derivative:

μ^(σ)=2μ^A/σnA/σ2\hat{\mu}'(\sigma) = -\frac{2 \hat{\mu} A/\sigma}{ n - A/\sigma^2 }

  • Loglikelihood contrast expansion (nonregular):

n(μ,σ0)n(0,σ0)=μ22σ04(Yi2σ02)μ412σ08Yi4+Rn(μ)\ell_n(\mu,\sigma_0)-\ell_n(0,\sigma_0) = \frac{\mu^2}{2\sigma_0^4} \sum(Y_i^2 - \sigma_0^2) - \frac{\mu^4}{12\sigma_0^8} \sum Y_i^4 + R_n(\mu)

  • Limiting contrast (nonregular):

Ψ(t)=t22σ04ZE[Y4]12σ08t4\Psi(t) = \frac{t^2}{2\sigma_0^4}Z - \frac{E[Y^4]}{12\sigma_0^8}t^4

where ZN(0,Var(Y2))Z \sim N(0, \operatorname{Var}(Y^2)).

The hierarchical folded normal model thus rests on an explicit characterization of likelihood geometry and nonregularity, with finite sample error control and guaranteed consistency—features directly translatable to complex hierarchical and penalized statistical models (Mallik, 25 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Hierarchical Folded Normal Model.