Expectation–Conditional Maximisation Algorithm

Updated 5 December 2025

The ECM algorithm is a variant of EM that replaces the difficult M-step with a series of simpler conditional maximization steps, ensuring tractable updates in latent variable models.
In sparse Gaussian graphical model estimation, ECM employs adaptive spike-and-slab penalties to efficiently select network structures while mitigating bias in high-dimensional settings.
For rigid and articulated point registration, ECMPR integrates robust outlier handling and precise rotation updates via SVD or SDP, improving registration accuracy over traditional methods.

The Expectation–Conditional Maximisation (ECM) algorithm is a deterministic alternative to traditional Expectation–Maximisation (EM), designed to address maximum-likelihood and posterior-mode estimation problems with latent variables and complex conditional structure. ECM refines the classic EM paradigm by decomposing the maximization (M) step into a sequence of simpler conditional maximization (CM) sub-steps, each of which is often analytically or computationally tractable. The ECM framework is particularly salient in high-dimensional settings and latent graphical models, where full conditional maximization is infeasible, and has produced state-of-the-art methods for both sparse graphical model selection and robust mixture-based rigid/articulated point registration.

1. Algorithmic Principle and Framework

In ECM, each iteration consists of an expectation (E) step, followed by one or more CM steps. The E-step computes the expectation of the complete-data log-likelihood (or log-posterior in Bayesian settings), conditioning on the current parameter estimates and observed data. Each CM step then conditionally maximizes this expected criterion with respect to a subset (or block) of the parameters, given the current values of the other blocks. The process cycles through the parameter blocks until all are updated.

Let $X$ denote observed data, $Z$ latent variables, and %%%%2%%%% parameter blocks. At iteration $\ell$ :

E-step: Compute $Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]$
For $k = 1,\dots,K$ CM-steps: $\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})$

Standard ECM theory guarantees monotonic increase in the observed-data likelihood (or Q-function) and convergence to a stationary point.

2. ECM for Sparse Gaussian Graphical Model Estimation

The ECM Graph Selection (EMGS) algorithm of Li & McCormick applies ECM to Gaussian graphical models with a spike-and-slab prior on the precision matrix $\Omega$ (Li et al., 2017). The observed data $X \in \mathbb{R}^{n \times p}$ are modeled as independent samples from $\mathcal{N}(0, \Omega^{-1})$ . The prior is specified as follows:

Off-diagonal: $\omega_{jk} \mid \delta_{jk} \sim \mathcal{N}(0, v_{\delta_{jk}}^2)$ with "spike" ( $v_0$ ) and "slab" ( $v_1$ ) variances ( $v_0 \ll v_1$ )
Diagonal: $\omega_{jj} \sim \mathrm{Exp}(\lambda/2)$
Latent indicators: $\delta_{jk} \sim \mathrm{Bernoulli}(\pi)$ , $\pi \sim \mathrm{Beta}(a,b)$

The complete-data log-posterior (up to constants) is:

$\log p(\Omega, \delta, \pi \mid X) = \frac{n}{2}\log|\Omega| - \frac{1}{2}\mathrm{tr}(S\Omega) - \sum_{j < k} \frac{\omega_{jk}^2}{2v_{\delta_{jk}}^2} - \frac{\lambda}{2} \sum_j \omega_{jj} + \sum_{j < k} [\delta_{jk} \log \pi + (1-\delta_{jk})\log(1-\pi)] + (a-1)\log \pi + (b-1)\log(1-\pi)$

where $S = X^T X$ .

Algorithmic Steps

E-step: Compute $p^*_{jk} = \mathbb{E}[\delta_{jk} \mid \Omega^{(\ell)}, \pi^{(\ell)}, X]$ and adaptive penalties $d^*_{jk} = \mathbb{E}[1/v_{\delta_{jk}}^2 \mid \cdot ]$ :

$p^*_{jk} = \frac{\pi^{(\ell)} \phi(\omega_{jk}^{(\ell)}/v_1)/v_1} {\pi^{(\ell)} \phi(\omega_{jk}^{(\ell)}/v_1)/v_1 + (1-\pi^{(\ell)}) \phi(\omega_{jk}^{(\ell)}/v_0)/v_0}$

$d^*_{jk} = \frac{1-p^*_{jk}}{v_0^2} + \frac{p^*_{jk}}{v_1^2}$

CM-step (π update): Closed form,

$\pi^{(\ell+1)} = \frac{a-1 + \sum_{j<k} p^*_{jk}}{a + b - 2 + p(p-1)/2}$

CM-step (Ω update): Update by columns, analogous to block-coordinate descent:

$\omega_{12}^{new} = [(s_{22}+\lambda)\Omega_{11}^{-1} + \mathrm{Diag}(d^*_{\cdot k})]^{-1}s_{12} \ \omega_{22}^{new} = (\omega_{12}^{new})^T \Omega_{11}^{-1} \omega_{12}^{new} + n/(s_{22}+\lambda)$

Repeat for each $k=1,\dots,p$ .

Adaptive Penalization

The mixture prior imparts elementwise adaptive ridge penalties, derived from the E-step expectation. Large $|\omega_{jk}|$ gain weak shrinkage (closer to $1/v_1^2$ ), while small $|\omega_{jk}|$ incur strong shrinkage ( $1/v_0^2$ ).

Pseudocode

initialize Ω = I_p, π = a/(a + b)
while change > tol and iter < maxit:
    # E-step
    for all j < k:
        compute p_star_jk and d_star_jk
    if X has missing:
        compute S <- E[X^T X | Ω, X_obs]
    # CM-step
    update π via Beta formula
    for k in 1...p:
        partition Ω, S; update ω_12 and ω_22
        reassemble Ω
    iter += 1

Empirically, tens of ECM iterations suffice; overall complexity per $\{v_0\}$ grid value is comparable to a single graphical lasso solution, but without nested inner optimizations (Li et al., 2017).

3. ECM for Rigid and Articulated Point Registration

The ECM for Point Registration (ECMPR) algorithm addresses rigid and articulated point-set matching by framing unknown correspondences as missing data in a mixture-model framework (Horaud et al., 2020). Model points $\mathcal{X} = \{X_i\}_{i=1}^n$ are linked to data points $\mathcal{Y} = \{Y_j\}_{j=1}^m$ via latent assignemnts $Z_j \in\{1, \dots, n, n+1\}$ , with $n+1$ denoting uniform outlier.

Mixture Model Formulation

$P(Z_j = i) = p_{\text{in}}$ ( $i \le n$ ), $= p_{\text{out}}$ ( $i = n+1$ )
$P(Y_j \mid Z_j = i) = \mathcal{N}(Y_j \mid \mu(X_i; \Theta), \Sigma_i)$ for $i \le n$
Outlier model: $P(Y_j \mid Z_j = n+1) = 1/V$

Iterative Steps

E-step: Compute soft assignments (posteriors):

$\alpha_{ji} = \frac{p_i \mathcal{N}(Y_j \mid \mu(X_i; \Theta), \Sigma_i)} {\sum_{k=1}^n p_k \mathcal{N}(Y_j \mid \mu(X_k; \Theta), \Sigma_k) + p_{n+1}/V}$

Outlier: $\alpha_{j, n+1} = 1 - \sum_{i=1}^n \alpha_{ji}$

CM-step 1 (Registration Parameters): For $\mu(X_i;\Theta) = RX_i + t$ (rigid), define:

$\lambda_i = \sum_{j=1}^m \alpha_{ji}, \quad W_i = \frac{1}{\lambda_i} \sum_{j=1}^m \alpha_{ji} Y_j$

Minimize:

$\mathcal{E}(R,t) = \frac{1}{2} \sum_{i=1}^n \lambda_i \|W_i - R X_i - t\|^2_{\Sigma_i}$

$t$ has a closed-form update
$R$ solved via Procrustes/SVD for isotropic $\Sigma_i$ , or via semidefinite program for general $\Sigma_i$

CM-step 2 (Covariance Updates):

$\Sigma_i = \frac{\sum_{j=1}^m \alpha_{ji} (Y_j - \mu(X_i; \Theta))(Y_j - \mu(X_i; \Theta))^T} {\lambda_i} + \varepsilon I$

Articulated Registration: Kinematic chain is decomposed partwise, registering each rigid group incrementally.

Robustness and Outlier Handling

A uniform component in the mixture ensures that points poorly explained by any Gaussian component are classified as outliers, providing automatic robustification without the need for hand-tuned thresholds.

Pseudocode

Initialize R ← I, t ← 0, Σ_i ← large*I
repeat
    E-step: compute α_{ji} for i = 1...n+1
    form λ_i, W_i
    CM-step 1: estimate (R,t) (Procrustes or SDP)
    CM-step 2: update Σ_i via weighted covariance
until convergence
classify each j by argmax_i α_{ji}

Each ECMPR iteration costs $O(mn)$ for the E-step and, in the anisotropic case, requires solving a small SDP (dimension 9) for the rotation update (Horaud et al., 2020).

4. Adaptive Penalization and Structured Priors

In EMGS, latent indicators $\delta_{jk}$ select between a "spike" and "slab" prior variance, translating into strongly or weakly penalized connections in the estimated graph. This confers adaptivity and reduces bias on large interactions, a notable advance relative to uniform penalties in standard glasso.

Structured priors can be incorporated by grouping the $(j,k)$ edges into blocks, each block sharing a group-specific slab variance $\tau_{g,g'}$ . These group scales can themselves be endowed with hyperpriors and updated closed-form in the CM step:

$\tau_{gg'} \leftarrow \frac{a_\tau - 1 + \frac{1}{2} \sum_{j<k} 1_{(g_j, g_k) = (g, g')}}{b_\tau + \frac{1}{2} \sum \omega_{jk}^2 d^*_{jk} 1_{...}}$

This facilitates both flexible penalization and the infusion of external prior knowledge (Li et al., 2017).

5. Extensions: Missing Data and Mixed/Discrete Data

Both ECM-based algorithms naturally address missing data:

For EMGS, missing entries in $X$ are imputed at each E-step using the conditional expectation of $X_i X_i^T$ under the current $\Omega$ .
For ECMPR, the likelihood formulation allows unassigned (outlier) points, and unobserved correspondences are latent.

EMGS extends to mixed/binary data via a Gaussian copula approach: latent variables $Z_i \sim \mathcal{N}(0, \Omega^{-1})$ are truncated to be compatible with observed data ranks, and expectations use stochastic or MCEM approximations, leaving the CM-steps unchanged (Li et al., 2017).

6. Computational Complexity and Convergence Properties

EMGS (Graph Selection): Each full cycle of $p$ block updates is $O(p^3)$ due to efficient rank-one updates, with empirical convergence in tens of iterations. Total cost over a hyperparameter grid is competitive with glasso, but requires fewer total system solves due to deterministic convergence and lack of inner routines.
ECMPR (Point Registration): E-step is $O(mn)$ , with the main computational overhead in the rotation update: SVD for isotropic covariance ( $O(n)$ ), small SDP for fully anisotropic case (dimension 9). Each iteration non-decreases the log-likelihood and converges to a stationary point, as inherited from the general ECM framework (Horaud et al., 2020).

EMGS offers adaptive elementwise penalization, warm-started regularization paths, and flexibility for structured priors and copula-extensions, in contrast with uniform-penalty glasso and MCMC stochastic search, which become computationally intractable in high dimensions.
ECMPR generalizes classical EM and ICP (Iterative Closest Point) methods by supporting full covariance modeling (yielding more robust assignments) and by integrating outlier rejection natively via a uniform mixture component. Its rotation update via SDP constitutes an exact relaxation, improving over heuristic/annealing procedures.

These advances highlight ECM as a principled framework for tractable, robust inference in challenging latent-variable models, with demonstrated efficacy in both sparse graphical model selection and robust geometric registration (Li et al., 2017, Horaud et al., 2020).

PDF Markdown Chat (Pro)

References (2)

An Expectation Conditional Maximization approach for Gaussian graphical models (2017)

Rigid and Articulated Point Registration with Expectation Conditional Maximization (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Expectation--Conditional Maximisation Algorithm.

Expectation–Conditional Maximisation Algorithm

1. Algorithmic Principle and Framework

2. ECM for Sparse Gaussian Graphical Model Estimation

Algorithmic Steps

Adaptive Penalization

Pseudocode

3. ECM for Rigid and Articulated Point Registration

Mixture Model Formulation

Iterative Steps

Robustness and Outlier Handling

Pseudocode

4. Adaptive Penalization and Structured Priors

5. Extensions: Missing Data and Mixed/Discrete Data

6. Computational Complexity and Convergence Properties

7. Comparison to Related Algorithms and Innovations

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics