Hierarchical Folded Normal Model

Updated 3 November 2025

Hierarchical Folded Normal Model is a complex probabilistic framework that embeds the folded normal distribution—obtained by taking the absolute value of a normal variate—into multilevel models.
The model features boundary coercivity and a unique profile likelihood maximizer, ensuring stable parameter estimation even under nonregular likelihood conditions.
Rigorous results establish Hausdorff consistency, nonstandard asymptotics (including n^(1/4) convergence for μ = 0), and practical implications for hierarchical and penalized inference.

The hierarchical folded normal model pertains to probabilistic models where the folded normal distribution is embedded as a prior, likelihood, or data-generating component within a broader, typically multilevel, statistical structure. The folded normal distribution arises by taking the absolute value of a normal variate, yielding a nonregular model with challenging likelihood geometry—features that propagate into hierarchical extensions. The most recent theoretical treatment establishes rigorous likelihood properties, full identification results, nonstandard asymptotics, and robust estimation principles for such models, addressing key boundary and uniform convergence issues (Mallik, 25 Aug 2025).

1. Folded Normal Distribution and Likelihood Structure

The folded normal distribution is defined as the distribution of $Y = |X|$ where $X \sim N(\mu, \sigma^2)$ . Its density for $y \geq 0$ is

$f(y; \mu, \sigma) = \frac{1}{\sigma}\left\{ \phi\left(\frac{y - \mu}{\sigma}\right) + \phi\left(\frac{y + \mu}{\sigma}\right) \right\} = \frac{1}{\sigma\sqrt{2\pi}} \exp\left( - \frac{y^2 + \mu^2}{2\sigma^2} \right) 2\cosh\left( \frac{y\mu}{\sigma^2} \right)$

where $\phi$ is the standard normal density. When incorporated as part of a (possibly hierarchical) stochastic model, the log-likelihood for observed $y_1, \ldots, y_n$ is: $\ell_n(\mu, \sigma) = \sum_{i=1}^n \log f(y_i; \mu, \sigma)$ This likelihood is even in $\mu$ , resulting in non-identifiability of the sign and intricate geometry, especially in hierarchical constructions where µ and σ may themselves depend on latent or hyperparameters.

2. Likelihood Geometry: Boundary Coercivity and Maximizers

The folded normal log-likelihood exhibits boundary coercivity: as $\sigma \to 0$ or $\sigma \to \infty$ , $\ell_n(\mu, \sigma) \to -\infty$ , provided the data exhibit nonzero sample variance. Explicitly, for any $\sigma \in (0,1]$ : $\sup_{\mu} \ell_n(\mu, \sigma) \leq -n \log \sigma - \frac{n s^2}{2 \sigma^2} + C_0$ where $s^2$ is the sample variance of the data. This property ensures that, within a hierarchical or penalized framework, likelihood maximization is not compromised by non-informative extrema at the domain boundaries.

For profile likelihood optimization, for each fixed $\sigma > 0$ , the function $\mu \mapsto \ell_n(\mu, \sigma)$ has a unique maximizer $\hat{\mu}(\sigma)$ , determined by the fixed-point equation: $\sum_{i=1}^n y_i \tanh(y_i \mu / \sigma^2) = n\mu$ The solution path $\hat{\mu}(\sigma)$ is strictly decreasing and $C^1$ in σ: $\hat{\mu}'(\sigma) = -\frac{2 \hat{\mu} A/\sigma}{ n - A/\sigma^2 } < 0$ with $A = \sum_i y_i^2 \,\mathrm{sech}^2(y_i\hat{\mu} / \sigma^2)$ . The profile likelihood in $\sigma$ , $\ell_{n,p}(\sigma) = \ell_n(\hat{\mu}(\sigma), \sigma)$ , is strictly unimodal, with exactly one maximizer in $(0, \infty)$ .

3. Identification, Consistency, and Asymptotic Rates

Let $\theta_0 = (\mu_0, \sigma_0)$ denote the true parameters, with the parameter space identifiable only up to sign, $\Theta_0 = \{ (\mu_0, \sigma_0), (-\mu_0, \sigma_0) \}$ . The estimation procedures yield the following guarantees:

Hausdorff Consistency: The set of maximizers $\{ (\pm\hat{\mu}_n, \hat{\sigma}_n) \}$ converges in the Hausdorff metric to the true set $\Theta_0$ :

$d_H\left( \{ (\pm\hat\mu_n, \hat\sigma_n)\},\; \Theta_0 \right) \to_p 0$

Kullback-Leibler Separation: The expected log-likelihood is strictly maximized on $\Theta_0$ , and population KL divergence is strictly positive elsewhere.

Asymptotic Distribution:

Property	$\mu_0 \neq 0$ (Regular)	$\mu_0 = 0$ (Nonregular)
MLE convergence rate	$n^{1/2}$	$n^{1/4}$
Limiting law	Gaussian	Argmax of quadratic-minus-quartic
Fisher information	Nonsingular	Singular in μ-direction
Confidence set geometry	Elliptical (Wald/LR)	Highly non-elliptical; Wald fails
Inference method	Classical (Wald/LR)	Nonstandard (LR, subsampling, etc.)

In the regular case $(\mu_0 \neq 0)$ , standard normal asymptotics apply. In the nonregular case $(\mu_0 = 0)$ , the Fisher information matrix is singular for μ, resulting in an estimator with convergence rate $n^{-1/4}$ and a limiting law described as an argmax of a quadratic-minus-quartic contrast: $\ell_n(\mu, \sigma_0) - \ell_n(0, \sigma_0) \approx \frac{\mu^2}{2\sigma_0^4} \sum (Y_i^2 - \sigma_0^2) - \frac{\mu^4}{12\sigma_0^8} \sum Y_i^4$

4. Uniform Laws, Envelope Bounds, and Finite Sample Control

The uniform law of large numbers (ULLN) for the folded normal log-likelihood is established via explicit envelope functions on the log-likelihood, score, and Hessian. This approach avoids covering number or entropy-based machinery and supports finite-sample deviation bounds. The principle is that, for any parameter set bounded away from identification sets, the probability of uniform deviations exceeding specified constants decreases rapidly with sample size.

Envelope bounds grant practical error control for both inference and hierarchical extensions, since parameter spaces can be large or growing with model complexity.

5. Implications for Hierarchical and Penalized Models

The properties established in the non-hierarchical case extend to hierarchical or multilevel models incorporating folded normal components. Key consequences:

Well-Behaved Posterior and Penalized Likelihoods: Boundary coercivity and unimodal profiling ensure posterior or penalized estimation procedures avoid pathological maxima.
Explicit Control in Empirical Bayes: Envelope bounds and uniform laws allow robust calculation of marginal likelihoods and empirical Bayes estimates, accounting for nonregularity, especially in "null signal" regimes.
Penalty Design: Boundary coercivity persists under suitable penalty functions. For mixture models and complex hierarchies, adding quadratic penalties in location and log-scale secures estimator existence and stability since the penalty dominates possible variance-collapse spikes.

A plausible implication is that sample size–dependent penalty tuning can be explicitly calculated for consistency: when the penalty shrinks slowly enough that sample size multiplied by penalty diverges, penalized estimators retain consistency.

6. Estimation Procedures and Practical Inference

MLE Computation: Profile likelihood in folded normal models has a unique maximizer; Newton-Raphson or similar algorithms are stable under mild starting conditions.
Confidence Interval Construction: For $\mu_0$ near zero, naive Wald intervals are inaccurate due to the one-fourth rate and non-Gaussian limits. Likelihood-ratio or subsampling-based intervals are preferred.
Variance Estimation: Inference for $\sigma$ is regular, but joint confidence regions with μ are highly non-elliptical in near-symmetry regimes.
Hierarchical/Bayesian Implementation: The explicit bounds and path regularity undergird robust prior and hyperparameter selection within hierarchical folded normal frameworks, especially for partially or wholly non-identifiable settings.

7. Summary Table and Key Formulas

Property	Regular ( $\mu_0 \neq 0$ )	Nonregular ( $\mu_0 = 0$ )
MLE rate	$n^{1/2}$	$n^{1/4}$ in $\mu$
Limiting law	Gaussian	Argmax quadratic-minus-quartic
Fisher information	Nonsingular	Singular in μ-direction
Inference strategy	Wald/LR	Nonstandard (not Wald)
Boundary behavior	Coercive	Coercive

Key formulas:

Folded normal density:

$f(y;\mu,\sigma) = \frac{1}{\sigma}\left\{ \phi\left(\frac{y-\mu}{\sigma}\right) + \phi\left(\frac{y+\mu}{\sigma}\right) \right\}$

Profile path derivative:

$\hat{\mu}'(\sigma) = -\frac{2 \hat{\mu} A/\sigma}{ n - A/\sigma^2 }$

Loglikelihood contrast expansion (nonregular):

$\ell_n(\mu,\sigma_0)-\ell_n(0,\sigma_0) = \frac{\mu^2}{2\sigma_0^4} \sum(Y_i^2 - \sigma_0^2) - \frac{\mu^4}{12\sigma_0^8} \sum Y_i^4 + R_n(\mu)$

Limiting contrast (nonregular):

$\Psi(t) = \frac{t^2}{2\sigma_0^4}Z - \frac{E[Y^4]}{12\sigma_0^8}t^4$

where $Z \sim N(0, \operatorname{Var}(Y^2))$ .

The hierarchical folded normal model thus rests on an explicit characterization of likelihood geometry and nonregularity, with finite sample error control and guaranteed consistency—features directly translatable to complex hierarchical and penalized statistical models (Mallik, 25 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

Haussdorff consistency of MLE in folded normal and Gaussian mixtures (2025)

Follow Topic

Get notified by email when new papers are published related to Hierarchical Folded Normal Model.