Expectation–Conditional Maximisation Algorithm
- The ECM algorithm is a variant of EM that replaces the difficult M-step with a series of simpler conditional maximization steps, ensuring tractable updates in latent variable models.
- In sparse Gaussian graphical model estimation, ECM employs adaptive spike-and-slab penalties to efficiently select network structures while mitigating bias in high-dimensional settings.
- For rigid and articulated point registration, ECMPR integrates robust outlier handling and precise rotation updates via SVD or SDP, improving registration accuracy over traditional methods.
The Expectation–Conditional Maximisation (ECM) algorithm is a deterministic alternative to traditional Expectation–Maximisation (EM), designed to address maximum-likelihood and posterior-mode estimation problems with latent variables and complex conditional structure. ECM refines the classic EM paradigm by decomposing the maximization (M) step into a sequence of simpler conditional maximization (CM) sub-steps, each of which is often analytically or computationally tractable. The ECM framework is particularly salient in high-dimensional settings and latent graphical models, where full conditional maximization is infeasible, and has produced state-of-the-art methods for both sparse graphical model selection and robust mixture-based rigid/articulated point registration.
1. Algorithmic Principle and Framework
In ECM, each iteration consists of an expectation (E) step, followed by one or more CM steps. The E-step computes the expectation of the complete-data log-likelihood (or log-posterior in Bayesian settings), conditioning on the current parameter estimates and observed data. Each CM step then conditionally maximizes this expected criterion with respect to a subset (or block) of the parameters, given the current values of the other blocks. The process cycles through the parameter blocks until all are updated.
Let denote observed data, latent variables, and %%%%2%%%% parameter blocks. At iteration :
- E-step: Compute
- For CM-steps:
Standard ECM theory guarantees monotonic increase in the observed-data likelihood (or Q-function) and convergence to a stationary point.
2. ECM for Sparse Gaussian Graphical Model Estimation
The ECM Graph Selection (EMGS) algorithm of Li & McCormick applies ECM to Gaussian graphical models with a spike-and-slab prior on the precision matrix (Li et al., 2017). The observed data are modeled as independent samples from . The prior is specified as follows:
- Off-diagonal: with "spike" () and "slab" () variances ()
- Diagonal:
- Latent indicators: ,
The complete-data log-posterior (up to constants) is:
where .
Algorithmic Steps
- E-step: Compute and adaptive penalties :
- CM-step (π update): Closed form,
- CM-step (Ω update): Update by columns, analogous to block-coordinate descent:
Repeat for each .
Adaptive Penalization
The mixture prior imparts elementwise adaptive ridge penalties, derived from the E-step expectation. Large gain weak shrinkage (closer to ), while small incur strong shrinkage ().
Pseudocode
1 2 3 4 5 6 7 8 9 10 11 12 13 |
initialize Ω = I_p, π = a/(a + b) while change > tol and iter < maxit: # E-step for all j < k: compute p_star_jk and d_star_jk if X has missing: compute S <- E[X^T X | Ω, X_obs] # CM-step update π via Beta formula for k in 1...p: partition Ω, S; update ω_12 and ω_22 reassemble Ω iter += 1 |
Empirically, tens of ECM iterations suffice; overall complexity per grid value is comparable to a single graphical lasso solution, but without nested inner optimizations (Li et al., 2017).
3. ECM for Rigid and Articulated Point Registration
The ECM for Point Registration (ECMPR) algorithm addresses rigid and articulated point-set matching by framing unknown correspondences as missing data in a mixture-model framework (Horaud et al., 2020). Model points are linked to data points via latent assignemnts , with denoting uniform outlier.
Mixture Model Formulation
- (), ()
- for
- Outlier model:
Iterative Steps
- E-step: Compute soft assignments (posteriors):
Outlier:
- CM-step 1 (Registration Parameters): For (rigid), define:
Minimize:
- has a closed-form update
- solved via Procrustes/SVD for isotropic , or via semidefinite program for general
- CM-step 2 (Covariance Updates):
- Articulated Registration: Kinematic chain is decomposed partwise, registering each rigid group incrementally.
Robustness and Outlier Handling
A uniform component in the mixture ensures that points poorly explained by any Gaussian component are classified as outliers, providing automatic robustification without the need for hand-tuned thresholds.
Pseudocode
1 2 3 4 5 6 7 8 |
Initialize R ← I, t ← 0, Σ_i ← large*I repeat E-step: compute α_{ji} for i = 1...n+1 form λ_i, W_i CM-step 1: estimate (R,t) (Procrustes or SDP) CM-step 2: update Σ_i via weighted covariance until convergence classify each j by argmax_i α_{ji} |
Each ECMPR iteration costs for the E-step and, in the anisotropic case, requires solving a small SDP (dimension 9) for the rotation update (Horaud et al., 2020).
4. Adaptive Penalization and Structured Priors
In EMGS, latent indicators select between a "spike" and "slab" prior variance, translating into strongly or weakly penalized connections in the estimated graph. This confers adaptivity and reduces bias on large interactions, a notable advance relative to uniform penalties in standard glasso.
Structured priors can be incorporated by grouping the edges into blocks, each block sharing a group-specific slab variance . These group scales can themselves be endowed with hyperpriors and updated closed-form in the CM step:
This facilitates both flexible penalization and the infusion of external prior knowledge (Li et al., 2017).
5. Extensions: Missing Data and Mixed/Discrete Data
Both ECM-based algorithms naturally address missing data:
- For EMGS, missing entries in are imputed at each E-step using the conditional expectation of under the current .
- For ECMPR, the likelihood formulation allows unassigned (outlier) points, and unobserved correspondences are latent.
EMGS extends to mixed/binary data via a Gaussian copula approach: latent variables are truncated to be compatible with observed data ranks, and expectations use stochastic or MCEM approximations, leaving the CM-steps unchanged (Li et al., 2017).
6. Computational Complexity and Convergence Properties
- EMGS (Graph Selection): Each full cycle of block updates is due to efficient rank-one updates, with empirical convergence in tens of iterations. Total cost over a hyperparameter grid is competitive with glasso, but requires fewer total system solves due to deterministic convergence and lack of inner routines.
- ECMPR (Point Registration): E-step is , with the main computational overhead in the rotation update: SVD for isotropic covariance (), small SDP for fully anisotropic case (dimension 9). Each iteration non-decreases the log-likelihood and converges to a stationary point, as inherited from the general ECM framework (Horaud et al., 2020).
7. Comparison to Related Algorithms and Innovations
- EMGS offers adaptive elementwise penalization, warm-started regularization paths, and flexibility for structured priors and copula-extensions, in contrast with uniform-penalty glasso and MCMC stochastic search, which become computationally intractable in high dimensions.
- ECMPR generalizes classical EM and ICP (Iterative Closest Point) methods by supporting full covariance modeling (yielding more robust assignments) and by integrating outlier rejection natively via a uniform mixture component. Its rotation update via SDP constitutes an exact relaxation, improving over heuristic/annealing procedures.
These advances highlight ECM as a principled framework for tractable, robust inference in challenging latent-variable models, with demonstrated efficacy in both sparse graphical model selection and robust geometric registration (Li et al., 2017, Horaud et al., 2020).