Papers
Topics
Authors
Recent
Search
2000 character limit reached

Expectation–Conditional Maximisation Algorithm

Updated 5 December 2025
  • The ECM algorithm is a variant of EM that replaces the difficult M-step with a series of simpler conditional maximization steps, ensuring tractable updates in latent variable models.
  • In sparse Gaussian graphical model estimation, ECM employs adaptive spike-and-slab penalties to efficiently select network structures while mitigating bias in high-dimensional settings.
  • For rigid and articulated point registration, ECMPR integrates robust outlier handling and precise rotation updates via SVD or SDP, improving registration accuracy over traditional methods.

The Expectation–Conditional Maximisation (ECM) algorithm is a deterministic alternative to traditional Expectation–Maximisation (EM), designed to address maximum-likelihood and posterior-mode estimation problems with latent variables and complex conditional structure. ECM refines the classic EM paradigm by decomposing the maximization (M) step into a sequence of simpler conditional maximization (CM) sub-steps, each of which is often analytically or computationally tractable. The ECM framework is particularly salient in high-dimensional settings and latent graphical models, where full conditional maximization is infeasible, and has produced state-of-the-art methods for both sparse graphical model selection and robust mixture-based rigid/articulated point registration.

1. Algorithmic Principle and Framework

In ECM, each iteration consists of an expectation (E) step, followed by one or more CM steps. The E-step computes the expectation of the complete-data log-likelihood (or log-posterior in Bayesian settings), conditioning on the current parameter estimates and observed data. Each CM step then conditionally maximizes this expected criterion with respect to a subset (or block) of the parameters, given the current values of the other blocks. The process cycles through the parameter blocks until all are updated.

Let XX denote observed data, ZZ latent variables, and θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K) parameter blocks. At iteration \ell:

  • E-step: Compute Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]
  • For k=1,,Kk = 1,\dots,K CM-steps: θk(+1)=argmaxθkQ(θ1(+1),,θk1(+1),θk,θk+1(),,θK())\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})

Standard ECM theory guarantees monotonic increase in the observed-data likelihood (or Q-function) and convergence to a stationary point.

2. ECM for Sparse Gaussian Graphical Model Estimation

The ECM Graph Selection (EMGS) algorithm of Li & McCormick applies ECM to Gaussian graphical models with a spike-and-slab prior on the precision matrix Ω\Omega (Li et al., 2017). The observed data XRn×pX \in \mathbb{R}^{n \times p} are modeled as independent samples from N(0,Ω1)\mathcal{N}(0, \Omega^{-1}). The prior is specified as follows:

  • Off-diagonal: ZZ0 with "spike" (ZZ1) and "slab" (ZZ2) variances (ZZ3)
  • Diagonal: ZZ4
  • Latent indicators: ZZ5, ZZ6

The complete-data log-posterior (up to constants) is:

ZZ7

where ZZ8.

Algorithmic Steps

  1. E-step: Compute ZZ9 and adaptive penalties θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)0:

θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)1

θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)2

  1. CM-step (π update): Closed form,

θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)3

  1. CM-step (Ω update): Update by columns, analogous to block-coordinate descent:

θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)4

Repeat for each θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)5.

Adaptive Penalization

The mixture prior imparts elementwise adaptive ridge penalties, derived from the E-step expectation. Large θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)6 gain weak shrinkage (closer to θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)7), while small θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)8 incur strong shrinkage (θ=(θ1,,θK)\theta = (\theta_1,\ldots,\theta_K)9).

Pseudocode

θk(+1)=argmaxθkQ(θ1(+1),,θk1(+1),θk,θk+1(),,θK())\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})5

Empirically, tens of ECM iterations suffice; overall complexity per \ell0 grid value is comparable to a single graphical lasso solution, but without nested inner optimizations (Li et al., 2017).

3. ECM for Rigid and Articulated Point Registration

The ECM for Point Registration (ECMPR) algorithm addresses rigid and articulated point-set matching by framing unknown correspondences as missing data in a mixture-model framework (Horaud et al., 2020). Model points \ell1 are linked to data points \ell2 via latent assignemnts \ell3, with \ell4 denoting uniform outlier.

Mixture Model Formulation

  • \ell5 (\ell6), \ell7 (\ell8)
  • \ell9 for Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]0
  • Outlier model: Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]1

Iterative Steps

  1. E-step: Compute soft assignments (posteriors):

Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]2

Outlier: Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]3

  1. CM-step 1 (Registration Parameters): For Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]4 (rigid), define:

Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]5

Minimize:

Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]6

  • Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]7 has a closed-form update
  • Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]8 solved via Procrustes/SVD for isotropic Q(θθ())=EZX,θ()[logp(X,Z;θ)]Q(\theta \mid \theta^{(\ell)}) = \mathbb{E}_{Z \mid X,\theta^{(\ell)}}[\log p(X,Z;\theta)]9, or via semidefinite program for general k=1,,Kk = 1,\dots,K0
  1. CM-step 2 (Covariance Updates):

k=1,,Kk = 1,\dots,K1

  1. Articulated Registration: Kinematic chain is decomposed partwise, registering each rigid group incrementally.

Robustness and Outlier Handling

A uniform component in the mixture ensures that points poorly explained by any Gaussian component are classified as outliers, providing automatic robustification without the need for hand-tuned thresholds.

Pseudocode

θk(+1)=argmaxθkQ(θ1(+1),,θk1(+1),θk,θk+1(),,θK())\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})6

Each ECMPR iteration costs k=1,,Kk = 1,\dots,K2 for the E-step and, in the anisotropic case, requires solving a small SDP (dimension 9) for the rotation update (Horaud et al., 2020).

4. Adaptive Penalization and Structured Priors

In EMGS, latent indicators k=1,,Kk = 1,\dots,K3 select between a "spike" and "slab" prior variance, translating into strongly or weakly penalized connections in the estimated graph. This confers adaptivity and reduces bias on large interactions, a notable advance relative to uniform penalties in standard glasso.

Structured priors can be incorporated by grouping the k=1,,Kk = 1,\dots,K4 edges into blocks, each block sharing a group-specific slab variance k=1,,Kk = 1,\dots,K5. These group scales can themselves be endowed with hyperpriors and updated closed-form in the CM step:

k=1,,Kk = 1,\dots,K6

This facilitates both flexible penalization and the infusion of external prior knowledge (Li et al., 2017).

5. Extensions: Missing Data and Mixed/Discrete Data

Both ECM-based algorithms naturally address missing data:

  • For EMGS, missing entries in k=1,,Kk = 1,\dots,K7 are imputed at each E-step using the conditional expectation of k=1,,Kk = 1,\dots,K8 under the current k=1,,Kk = 1,\dots,K9.
  • For ECMPR, the likelihood formulation allows unassigned (outlier) points, and unobserved correspondences are latent.

EMGS extends to mixed/binary data via a Gaussian copula approach: latent variables θk(+1)=argmaxθkQ(θ1(+1),,θk1(+1),θk,θk+1(),,θK())\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})0 are truncated to be compatible with observed data ranks, and expectations use stochastic or MCEM approximations, leaving the CM-steps unchanged (Li et al., 2017).

6. Computational Complexity and Convergence Properties

  • EMGS (Graph Selection): Each full cycle of θk(+1)=argmaxθkQ(θ1(+1),,θk1(+1),θk,θk+1(),,θK())\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})1 block updates is θk(+1)=argmaxθkQ(θ1(+1),,θk1(+1),θk,θk+1(),,θK())\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})2 due to efficient rank-one updates, with empirical convergence in tens of iterations. Total cost over a hyperparameter grid is competitive with glasso, but requires fewer total system solves due to deterministic convergence and lack of inner routines.
  • ECMPR (Point Registration): E-step is θk(+1)=argmaxθkQ(θ1(+1),,θk1(+1),θk,θk+1(),,θK())\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})3, with the main computational overhead in the rotation update: SVD for isotropic covariance (θk(+1)=argmaxθkQ(θ1(+1),,θk1(+1),θk,θk+1(),,θK())\theta_k^{(\ell+1)} = \arg\max_{\theta_k} Q(\theta_1^{(\ell+1)},\ldots,\theta_{k-1}^{(\ell+1)}, \theta_k, \theta_{k+1}^{(\ell)}, \ldots, \theta_K^{(\ell)})4), small SDP for fully anisotropic case (dimension 9). Each iteration non-decreases the log-likelihood and converges to a stationary point, as inherited from the general ECM framework (Horaud et al., 2020).
  • EMGS offers adaptive elementwise penalization, warm-started regularization paths, and flexibility for structured priors and copula-extensions, in contrast with uniform-penalty glasso and MCMC stochastic search, which become computationally intractable in high dimensions.
  • ECMPR generalizes classical EM and ICP (Iterative Closest Point) methods by supporting full covariance modeling (yielding more robust assignments) and by integrating outlier rejection natively via a uniform mixture component. Its rotation update via SDP constitutes an exact relaxation, improving over heuristic/annealing procedures.

These advances highlight ECM as a principled framework for tractable, robust inference in challenging latent-variable models, with demonstrated efficacy in both sparse graphical model selection and robust geometric registration (Li et al., 2017, Horaud et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Expectation--Conditional Maximisation Algorithm.