Conditional Expectation Matching (CEM)
- Conditional Expectation Matching (CEM) is a suite of methods that matches conditional moments using quadratic optimization, kernel embeddings, and Bayesian inference.
- It applies constrained quadratic filters in remote sensing and augments these methods to reduce to optimal matched filters through mean-centering and band addition.
- CEM extends into RKHS via conditional mean embeddings and vector-valued kernel ridge regression, offering scalable, accurate inference even in non-conjugate models.
Conditional Expectation Matching (CEM) encompasses a collection of methods and algorithms that match conditional moments in either direct target-detection, kernel-based statistical inference, or approximate Bayesian message passing. Initial conceptions of CEM appeared in remote sensing, where it referred to the design of constrained quadratic filters for anomaly and target detection; further extensions emerged in kernel conditional mean embeddings, and algorithmic variants include conditional expectation propagation in probabilistic inference. With formal roots in constrained quadratic optimization and Banach-space conditional expectation, CEM is now encountered in diverse mathematical and algorithmic disciplines.
1. Quadratic Optimization Formulation of CEM in Target Detection
In remote sensing, Conditional Expectation Matching refers to a class of algorithms derived from the linearly-constrained minimum-variance (LCMV) beamformer and encompasses the classical constrained energy minimization (CEM). Given a dataset of pixel vectors assembled into a data matrix and a known target signature , CEM seeks a filter that minimizes the energy of the filter output over the background pixels, under the constraint . The sample correlation matrix is . The optimization reads:
No distributional assumption is required beyond being nonsingular. The closed-form solution is:
where is rescaled to ensure unit response on the target signature. CEM thus matches the conditional expectation of the filter output for the target signature under the second-moment statistics of the background pixels (Geng et al., 2016).
2. Augmented CEM and Reduction to Matched Filter
CEM can be systematically improved by adding linearly independent bands, most notably an all-ones band. The augmented form, called Augmented CEM (ACEM), appends a constant band to the data and augments accordingly. The augmented correlation matrix and target vector are:
where is the sample mean. ACEM solves the analogous quadratic minimization and yields a filter whose first components,
are proven via block-matrix inversion and Sherman–Morrison formula to be algebraically proportional to the classical matched filter (MF) vector on mean-centered data. MF is optimal under Neyman–Pearson theory for Gaussian data and is always at least as good as CEM, both in theory and practice. This result demonstrates the universal superiority of MF over CEM for energy-criterion target detection (Geng et al., 2016).
3. Measure-Theoretic Conditional Expectation Matching in RKHS
CEM methods also arise in statistical learning as conditional mean embedding (CME) of random variables into Reproducing Kernel Hilbert Spaces (RKHS). Let carry and , and let be a positive-definite kernel with RKHS . The measure-theoretic conditional mean embedding is
This is a unique, -measurable, Bochner-integrable -valued random variable, satisfying
The interchange property holds: almost surely. This construction underpins modern nonparametric conditional moment matching (Park et al., 2020).
4. Empirical Estimation via Vector-Valued Kernel Ridge Regression
To estimate the conditional mean embedding, one applies vector-valued kernel ridge regression. For samples and RKHS over , define the empirical objective:
The representer theorem yields a solution in the span of kernel evaluations, leading to the empirical CME
where . Consistency and rates are established under boundedness, -universality, and appropriate regularization scaling, with universal consistency and convergence rates in well-specified cases (Park et al., 2020).
5. CEM in Approximate Bayesian Inference: Conditional Expectation Propagation
Conditional Expectation Propagation (CEP) implements CEM in expectation propagation (EP) for scalable Bayesian inference. CEP replaces global moment-matching with two-stage conditional matching:
- Conditional Matching: Factor into blocks, match moments of the conditional tilted distribution (often analytic or low-dimensional).
- Marginal Expectation: Take expectation w.r.t. the current approximate posterior on external variables, often via Taylor expansions or low-dimensional quadrature.
The update step matches conditional sufficient statistics, then aggregates via fixed-point iterations, yielding:
- Fully analytic message updates (often closed-form)
- Linear cost per sweep in the number of messages and block size
- Empirical performance: Near-identical inference quality to standard EP, with substantial speed gains—10× over Laplace propagation, 100× over importance sampling, and competitive with variational message passing in large-scale non-conjugate models (Wang et al., 2019).
CEP applies to Bayesian logistic and probit regression, Bayesian tensor decomposition, and other models with intractable EP updates, yielding efficient, accurate, and scalable inference.
6. Conditional MMD and HSIC: Extensions of CEM in Kernel Inference
Conditional Expectation Matching induces conditional discrepancy measures:
- Maximum Conditional Mean Discrepancy (MMD):
with characteristic kernels, vanishing iff conditionals coincide (Park et al., 2020).
- Hilbert-Schmidt Conditional Independence Criterion (HSIC):
vanishes iff under characteristic tensor-kernels.
Plug-in estimators use empirical CMEs to yield closed-form Gram-matrix expressions for population-level conditional hypothesis testing.
7. Practical Implications and Limitations
CEM-based filters in remote sensing (classical and augmented forms) differ from the matched filter primarily in mean-centering and susceptibility to paradoxes (e.g., improvement with irrelevant constant bands). MF circumvents these via explicit mean subtraction and achieves Neyman–Pearson optimality under Gaussianity, whereas CEM's optimization is purely second-order and can be misled by irrelevant features (Geng et al., 2016). In kernel-based CME, conditional expectation matching provides theoretically sound, consistent embeddings for nonparametric conditional analysis, with practical plug-in estimators. In approximate Bayesian inference, CEM-based CEP provides computational advantages and analytic tractability, especially in high-dimensional and non-conjugate settings.
Simulation studies confirm theoretical predictions: conditional MMD and HSIC both vanish under null hypotheses and detect departures when underlying conditional distributions or dependencies differ (Park et al., 2020). Practical CEP implementations yield fast, robust inference across diverse Bayesian models, matching EP accuracy while avoiding expensive numerical optimization and sampling (Wang et al., 2019).
CEM's technical formulations, analytic tractability, and universality render it foundational in both classical signal processing and contemporary kernel/statistical learning, with further methodological extensions continuing to attract research interest.