Mean Square Optimal Estimation
- Mean square optimal estimation is a paradigm that minimizes the mean squared error by deriving the MMSE estimator, defined as the conditional mean.
- Key results show that linearity holds under a precise matching condition between source and noise, with the Gaussian case uniquely permitting multi-SNR linearity.
- Extensions to vector cases require coordinate-wise transformations and independence conditions, ensuring optimal linear estimation across dimensions.
Mean square optimal estimation is a central paradigm in statistical inference, signal processing, and information theory, in which one seeks an estimator that minimizes the expected squared error (mean squared error, MSE) between an unknown parameter or signal and its estimate, typically under a physical observation or system model with noise or other uncertainty. The classical minimum mean square error (MMSE) estimator is the conditional mean. However, the precise properties, structure, and (especially) linearity of the mean square optimal estimator depend intricately on the joint distribution of the signal and noise. The question of when the optimal estimator is linear—and, more generally, the conditions and consequences of MSE-optimality—has profound implications for theory and practice.
1. Conditions for Linearity of the Mean Square Optimal Estimator
Let (source) and (noise) be independent random variables, with observed. The MMSE estimator is . It is well known that if are jointly Gaussian, the MMSE estimator is linear:
The general question—when is linear?—is addressed by deriving a necessary and sufficient condition using the characteristic functions and . For distortion ( with even ), Theorem 1 shows optimal linearity if and only if the following differential equation is satisfied:
for the linear estimator .
For mean square error (), this reduces to the "matching condition":
i.e., the source characteristic function must be a positive power (possibly fractional) of the noise characteristic function. This is both necessary and sufficient for the linearity of in general.
2. Existence and Uniqueness of Matching Distributions
The matching condition has several profound consequences:
- If is a natural number, any yields a valid matching : is a sum of independent copies of .
- More generally, is a characteristic function if and only if is infinitely divisible (e.g., Gaussian, Poisson, stable law).
- Uniqueness: If is analytic (i.e., the distribution has moments of all orders), the matching is unique and determined by the moments.
| Parameter | Matching Existence | Matching Uniqueness |
|---|---|---|
| Always | By construction | |
| infinitely divisible | For all | If analytic |
| (identical distributions) | Unique |
If and have equal variances (), the only way is linear is if and are identically distributed, i.e., .
3. The Uniqueness of the Gaussian Source–Noise Pair
A key result (Theorem 5) is that the Gaussian source–channel pair is uniquely characterized by the property that linearity of holds for more than one signal-to-noise ratio (SNR) value. If, for two different SNRs , , the matching condition can be satisfied——then log must be quadratic, so is Gaussian.
Thus: For any non-Gaussian pair, the linear estimator can be optimal at most for a single SNR. For all SNRs, only the jointly Gaussian case yields linear .
4. Asymptotic Linearity at Low and High SNR
For general (not necessarily matching) source–noise pairs:
- As (low SNR, i.e. the noise dominates), if is Gaussian, becomes asymptotically linear for any source.
- As (high SNR), if is Gaussian, becomes asymptotically linear for any noise.
This "asymptotic robustness" explains the empirical success of linear estimators such as the Wiener filter in diverse regimes, even when the exact matching condition fails.
5. Vector Case: Transformation and Coordinate-wise Matching
In the vector observation setting ( with , independent vectors), optimal linearity is more restrictive. The necessary and sufficient condition is that, after a linear transformation (which diagonalizes ), the components of and satisfy
for each , where are the eigenvalues of .
Moreover, the transformed sources and noise must satisfy certain independence or conditional independence conditions: in the case of distinct eigenvalues, optimality of a linear estimator requires independence of those coordinates (no spurious dependencies across dimensions).
6. Consequences and Broader Implications
- The mean square optimal (MMSE) estimator is linear if and only if the source and noise distributions satisfy the precise matching condition .
- The only source–noise pair for which MMSE linearity persists for all SNRs is the Gaussian–Gaussian pair.
- Linear estimators are asymptotically optimal at extreme SNRs, provided either the source or noise is Gaussian.
- In practical estimation, the use of linear estimators is justified either when the matching condition holds or when operating in extreme SNR regimes.
- In the vector case, these results extend but require block-wise (coordinate-wise) matching—after appropriate diagonalization—and further require conditional independence conditions not needed in the scalar case.
7. Summary Table of Linearity Conditions
| Scenario | Matching/Optimality Condition | Estimator |
|---|---|---|
| Scalar MSE () | ||
| Equal variances () | ||
| Multiple SNR linearity | Gaussian only | |
| Low SNR, Gaussian noise | Asymptotically linear, any source | |
| High SNR, Gaussian source | Asymptotically linear, any noise | |
| Vector case | for each ; coordinate-wise independence required | Linear in |
These results precisely characterize the conditions under which mean square optimal estimation is linear and establish the Gaussian case as uniquely linear at all SNRs, situating the Wiener filter and related techniques on a rigorous foundation (Akyol et al., 2011).
References (by arXiv id):
- (Akyol et al., 2011) On Conditions for Linearity of Optimal Estimation