Conditional Expectation Matching (CEM)

Updated 3 December 2025

Conditional Expectation Matching (CEM) is a suite of methods that matches conditional moments using quadratic optimization, kernel embeddings, and Bayesian inference.
It applies constrained quadratic filters in remote sensing and augments these methods to reduce to optimal matched filters through mean-centering and band addition.
CEM extends into RKHS via conditional mean embeddings and vector-valued kernel ridge regression, offering scalable, accurate inference even in non-conjugate models.

Conditional Expectation Matching (CEM) encompasses a collection of methods and algorithms that match conditional moments in either direct target-detection, kernel-based statistical inference, or approximate Bayesian message passing. Initial conceptions of CEM appeared in remote sensing, where it referred to the design of constrained quadratic filters for anomaly and target detection; further extensions emerged in kernel conditional mean embeddings, and algorithmic variants include conditional expectation propagation in probabilistic inference. With formal roots in constrained quadratic optimization and Banach-space conditional expectation, CEM is now encountered in diverse mathematical and algorithmic disciplines.

1. Quadratic Optimization Formulation of CEM in Target Detection

In remote sensing, Conditional Expectation Matching refers to a class of algorithms derived from the linearly-constrained minimum-variance (LCMV) beamformer and encompasses the classical constrained energy minimization (CEM). Given a dataset of $N$ pixel vectors $r_i\in\mathbb{R}^L$ assembled into a data matrix $X\in\mathbb{R}^{L\times N}$ and a known target signature $d\in\mathbb{R}^L$ , CEM seeks a filter $w\in\mathbb{R}^L$ that minimizes the energy of the filter output over the background pixels, under the constraint $d^T w = 1$ . The sample correlation matrix is $R = \frac1N \sum_{i=1}^N r_i r_i^T$ . The optimization reads:

$\min_{w\in\mathbb{R}^L} w^T R w \quad \text{subject to} \quad d^T w = 1$

No distributional assumption is required beyond $R$ being nonsingular. The closed-form solution is:

$w_{CEM} = \frac{R^{-1} d}{d^T R^{-1} d}$

where $R^{-1} d$ is rescaled to ensure unit response on the target signature. CEM thus matches the conditional expectation of the filter output for the target signature $d$ under the second-moment statistics of the background pixels (Geng et al., 2016).

2. Augmented CEM and Reduction to Matched Filter

CEM can be systematically improved by adding linearly independent bands, most notably an all-ones band. The augmented form, called Augmented CEM (ACEM), appends a constant band to the data and augments $d$ accordingly. The augmented correlation matrix and target vector are:

$\tilde{X} = \begin{bmatrix} X \ \mathbf{1}^T \end{bmatrix}, \quad \tilde{d} = \begin{bmatrix} d \ 1 \end{bmatrix}$

$\tilde{R} = \frac{1}{N}\tilde{X}\tilde{X}^T = \begin{pmatrix} R & m \ m^T & 1 \end{pmatrix}$

where $m$ is the sample mean. ACEM solves the analogous quadratic minimization and yields a filter whose first $L$ components,

$w_{ACEM(1:L)} = \frac{1}{\tilde{d}^T \tilde{R}^{-1} \tilde{d}} \left( R^{-1} d + b_1 (m^T R^{-1} d - 1) R^{-1} m \right)$

are proven via block-matrix inversion and Sherman–Morrison formula to be algebraically proportional to the classical matched filter (MF) vector on mean-centered data. MF is optimal under Neyman–Pearson theory for Gaussian data and is always at least as good as CEM, both in theory and practice. This result demonstrates the universal superiority of MF over CEM for energy-criterion target detection (Geng et al., 2016).

3. Measure-Theoretic Conditional Expectation Matching in RKHS

CEM methods also arise in statistical learning as conditional mean embedding (CME) of random variables into Reproducing Kernel Hilbert Spaces (RKHS). Let $(\Omega, \mathcal{F}, P)$ carry $X:\Omega\to\mathcal{X}$ and $Y:\Omega\to\mathcal{Y}$ , and let $k$ be a positive-definite kernel with RKHS $\mathcal{H}$ . The measure-theoretic conditional mean embedding is

$\mu_{Y|X} := \mathbb{E}[k(Y, \cdot) \mid \sigma(X)]$

This is a unique, $\sigma(X)$ -measurable, Bochner-integrable $\mathcal{H}$ -valued random variable, satisfying

$\forall h \in \mathcal{H}, \quad \int_A \langle h, \mu_{Y|X} \rangle_{\mathcal{H}} dP = \int_A h(Y(\omega)) dP(\omega) \quad \forall A \in \sigma(X)$

The interchange property holds: $\mathbb{E}[h(Y)\mid X] = \langle h, \mu_{Y|X}\rangle_{\mathcal{H}}$ almost surely. This construction underpins modern nonparametric conditional moment matching (Park et al., 2020).

4. Empirical Estimation via Vector-Valued Kernel Ridge Regression

To estimate the conditional mean embedding, one applies vector-valued kernel ridge regression. For samples $\{(x_i, y_i)\}$ and RKHS $\mathcal{G}$ over $\mathcal{X}$ , define the empirical objective:

$\widehat{\mathcal{E}}_{n,\lambda}(F) = \frac{1}{n}\sum_{i=1}^n \| k(y_i, \cdot) - F(x_i) \|^2_{\mathcal{H}} + \lambda \| F \|^2_{\mathcal{G}}$

The representer theorem yields a solution in the span of kernel evaluations, leading to the empirical CME

$\widehat{\mu}_{Y|X=x} = \sum_{i=1}^n \beta_i(x) k(y_i, \cdot)$

where $\boldsymbol{\beta}(x) = (K_X + n\lambda I_n)^{-1}(k_{\mathcal{X}}(x_1,x),\dotsc,k_{\mathcal{X}}(x_n,x))^T$ . Consistency and rates are established under boundedness, $C_0$ -universality, and appropriate regularization scaling, with universal consistency and convergence rates $\mathcal{O}_p(n^{-1/4})$ in well-specified cases (Park et al., 2020).

5. CEM in Approximate Bayesian Inference: Conditional Expectation Propagation

Conditional Expectation Propagation (CEP) implements CEM in expectation propagation (EP) for scalable Bayesian inference. CEP replaces global moment-matching with two-stage conditional matching:

Conditional Matching: Factor $x_i$ into blocks, match moments of the conditional tilted distribution (often analytic or low-dimensional).
Marginal Expectation: Take expectation w.r.t. the current approximate posterior on external variables, often via Taylor expansions or low-dimensional quadrature.

The update step matches conditional sufficient statistics, then aggregates via fixed-point iterations, yielding:

Fully analytic message updates (often closed-form)
Linear cost per sweep in the number of messages and block size
Empirical performance: Near-identical inference quality to standard EP, with substantial speed gains—10× over Laplace propagation, 100× over importance sampling, and competitive with variational message passing in large-scale non-conjugate models (Wang et al., 2019).

CEP applies to Bayesian logistic and probit regression, Bayesian tensor decomposition, and other models with intractable EP updates, yielding efficient, accurate, and scalable inference.

6. Conditional MMD and HSIC: Extensions of CEM in Kernel Inference

Conditional Expectation Matching induces conditional discrepancy measures:

Maximum Conditional Mean Discrepancy (MMD):

$\mathrm{MMD}_{\mathrm{cond}}(P,Q;x) = \| \mathbb{E}^P[k(Y, \cdot) \mid X = x] - \mathbb{E}^Q[k(Y, \cdot) \mid X = x] \|_{\mathcal{H}}$

with characteristic kernels, vanishing iff conditionals coincide (Park et al., 2020).

Hilbert-Schmidt Conditional Independence Criterion (HSIC):

$\mathrm{HSIC}_{\mathrm{cond}}(X, Y \mid Z) = \| \mu_{XY|Z} - \mu_{X|Z} \otimes \mu_{Y|Z} \|^2_{\mathcal{H}_{\mathcal{X}}\otimes \mathcal{H}_{\mathcal{Y}}}$

vanishes iff $X \perp Y \mid Z$ under characteristic tensor-kernels.

Plug-in estimators use empirical CMEs to yield closed-form Gram-matrix expressions for population-level conditional hypothesis testing.

7. Practical Implications and Limitations

CEM-based filters in remote sensing (classical and augmented forms) differ from the matched filter primarily in mean-centering and susceptibility to paradoxes (e.g., improvement with irrelevant constant bands). MF circumvents these via explicit mean subtraction and achieves Neyman–Pearson optimality under Gaussianity, whereas CEM's optimization is purely second-order and can be misled by irrelevant features (Geng et al., 2016). In kernel-based CME, conditional expectation matching provides theoretically sound, consistent embeddings for nonparametric conditional analysis, with practical plug-in estimators. In approximate Bayesian inference, CEM-based CEP provides computational advantages and analytic tractability, especially in high-dimensional and non-conjugate settings.

Simulation studies confirm theoretical predictions: conditional MMD and HSIC both vanish under null hypotheses and detect departures when underlying conditional distributions or dependencies differ (Park et al., 2020). Practical CEP implementations yield fast, robust inference across diverse Bayesian models, matching EP accuracy while avoiding expensive numerical optimization and sampling (Wang et al., 2019).

CEM's technical formulations, analytic tractability, and universality render it foundational in both classical signal processing and contemporary kernel/statistical learning, with further methodological extensions continuing to attract research interest.

PDF Markdown Chat (Pro)

References (3)

MF is always superior to CEM (2016)

A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings (2020)

Conditional Expectation Propagation (2019)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Conditional Expectation Matching (CEM).