Minorization-Maximization (MM)
- Minorization-Maximization (MM) is a framework that iteratively optimizes complex objectives by constructing surrogate functions with guaranteed monotonic ascent.
- It employs techniques such as Taylor approximations, Jensen’s inequality, and quadratic bounds to transform intractable problems into tractable subproblems.
- MM underlies various algorithms in machine learning and signal processing, offering unified insights and effectiveness in methods like EM, IRLS, and matrix optimization.
Minorization-Maximization (MM) is a general-purpose framework for maximizing (or minimizing) complex objective functions by iteratively constructing and optimizing simple surrogate functions. The MM paradigm encompasses a broad class of algorithms where, at each iteration, a carefully crafted surrogate is constructed to be tangential and locally minorizing (for maximization) the true objective at the current iterate. This approach delivers guaranteed monotonic ascent of the objective function and enables the solution of otherwise intractable or nonconvex problems by breaking them down into a sequence of tractable subproblems.
1. Formal Definition and Core Principles
The MM algorithm operates on an objective function to solve
At each iteration , it constructs a surrogate (minorizing) function satisfying:
- Tangency: ,
- Minorization: for all .
The next iterate is then defined by solving
By construction,
so the sequence is nondecreasing (Wu et al., 2011, Nguyen, 2016).
2. Construction of Surrogate Functions
The surrogate function in MM is obtained using global tangency and lower-bounding arguments derived from convexity, concavity, supporting hyperplanes, or classical inequalities such as Jensen's inequality and first-order Taylor expansions. Core approaches include:
- Linearization (supporting hyperplane): For convex/concave objectives, a first-order Taylor approximation yields a global lower/upper bound.
- Quadratric minorization: For more intricate curvature, global quadratic lower bounds can be used.
- Jensen's inequality: Especially relevant for log-sum-exp terms, as in mixture models.
- Auxiliary variable transformations: For sum-of-ratios or matrix-fractional objectives, auxiliary variables (e.g., matrix quadratic forms) can effect decoupling and yield separable surrogates.
Table 1: Common MM Surrogate Construction Methods
| Principle | Typical Setting | Example |
|---|---|---|
| Supporting plane | Convex (min) / concave (max) | |
| Jensen’s inequality | Mixture models, log-sum-exp | |
| Quadratic bound | Smooth but nonconvex | SVM hinge loss, |
For Gaussian mixture models (GMMs), for example, the MM surrogate is given by the tangent plane to the convex log-sum-exp (Sahu et al., 2020), producing iterates that match the classical EM algorithm but are derived entirely from convex analysis, not latent variable integration.
3. Theoretical Properties: Monotonicity and Convergence
MM algorithms guarantee monotonic ascent (or descent, in the minimization variant) of the target objective. The monotonicity proof follows directly from the construction of the surrogate. Under mild conditions—continuity of the surrogate, compactness of sublevel sets, and directional-derivative matching—cluster points of the iterates are stationary points of the original objective function (Wu et al., 2011, Nguyen, 2016, Naghsh et al., 2019). For strongly concave (maximization) or convex (minimization) objectives, global convergence is guaranteed; otherwise, the method converges to a local optimum or saddle (Wu et al., 2011).
The local rate of convergence depends on the curvature matching between the surrogate and the target objective: the closer the local Hessians, the faster the convergence. In general, the method is first-order; acceleration techniques (e.g., quasi-Newton overlays, step-doubling, SQUAREM) are often used to enhance the practical rate (Nguyen, 2016).
4. Applications Across Machine Learning and Signal Processing
The MM framework underlies a spectrum of algorithms in statistics, signal processing, and machine learning:
- Mixture models: MM yields the same parameter updates as expectation-maximization (EM), but uses convexity rather than hidden-variable marginalization (Sahu et al., 2020).
- Variance component models: MM produces block-coordinate updates separating fixed effects and variance components, with guaranteed monotonicity and faster convergence than EM in over-parameterized settings (Zhou et al., 2015).
- Heteroscedastic and quantile regression: MM provides IRLS updates for models with heavy-tailed or heteroscedastic noise, outperforming Newton-type methods by decoupling the parameter blocks and eliminating full Hessian computation (Nguyen et al., 2016, Cheng et al., 2024).
- Matrix optimization in wireless communications: Weighted sum-rate, fairness, and beamforming problems in MIMO systems are solved via surrogates constructed using Lagrangian duality and matrix fractional programming, leading to algorithms with per-iteration complexity governed by convex optimization subroutines (Zhang et al., 2023, Amor et al., 2024, Shen et al., 2018, Kim et al., 2016).
- Fair and robust PCA: MM transforms challenging max-min and sparsity-constrained subspace problems into a sequence of surrogate SDPs or QPs with closed-form (or efficiently solvable) inner steps (Babu et al., 2023).
- Nonparametric mixture estimation and copula models: MM enables monotonic maximization for penalized density estimation with dependent marginal structures (Levine, 22 May 2025).
5. Relationship to Expectation-Maximization and Other Paradigms
EM is a special case of MM, with its surrogate arising from Jensen’s inequality applied to the expected complete-data log-likelihood. In EM, the surrogate (E-step) relies on conditional expectations with respect to latent variables, whereas MM can exploit convex-analytic surrogates even in models with no latent-variable structure (Wu et al., 2011, Sahu et al., 2020, Nguyen, 2016). This distinction is significant in high-dimensional or non-convex models where conditional expectation is infeasible or ill-defined.
Moreover, MM generalizes to block-wise (coordinate) and alternating minimization/maximization schemes, providing a unifying lens on block coordinate ascent, block coordinate descent, and alternating projections in optimization (Naghsh et al., 2019, Babu et al., 2023, Zhang et al., 2023).
6. Surrogate Design: Practical Guidelines and Algorithmic Variants
When designing MM algorithms, several practical considerations arise (Nguyen, 2016, Wu et al., 2011, Zhang et al., 2023):
- Surrogate Tightness vs. Tractability: Tighter surrogates yield faster local convergence but may require harder maximizations. Simpler surrogates deliver closed-form or fast numerical solutions at the expense of more iterations.
- Block or Coordinate Updates: Separability in the surrogate can be exploited for parallelization and distributed computation (e.g., MapReduce for heteroscedastic regression (Nguyen et al., 2016)).
- Stochastic and Online MM: Large-scale or streaming problems can employ mini-batch minorization, ensuring expected monotonic improvement.
- Constraint Handling: MM accommodates various constraints as long as the surrogate optimization respects the feasible set; standard projection or Lagrangian methods are directly applicable (Zhang et al., 2023, Babu et al., 2023).
- Initialization and Stopping: MM is a local algorithm; initialization influences the stationary point reached, motivating heuristic or problem-informed starting points (e.g., spectral or random for mixture models).
Table 2: Variants and Extensions of MM
| Variant | Description / Application | Reference |
|---|---|---|
| Block-coordinate MM | Sequential update of parameter blocks | (Nguyen et al., 2016, Zhou et al., 2015) |
| Accelerated MM (e.g. aMM) | Quasi-Newton and step-doubling overlays | (Zhou et al., 2015, Nguyen, 2016) |
| Stochastic/online MM | Data sampled in mini-batches | (Nguyen, 2016) |
| Penalized/regularized MM | Surrogates for , nuclear, entropy | (Babu et al., 2023, Zhou et al., 2015) |
7. Impact, Limitations, and Contemporary Research Directions
Minorization-maximization offers a meta-algorithmic structure for deriving robust, monotonic, and scalable optimization schemes in a vast range of statistical inference and signal processing contexts. Its unifying perspective clarifies the geometric and analytic underpinnings of incumbent methods (EM, IRLS, WMMSE, fractional programming) (Zhang et al., 2023, Shen et al., 2018, Kim et al., 2016). The framework is also foundational for modern fair, robust, and high-dimensional inference methods (Babu et al., 2023, Cheng et al., 2024, Levine, 22 May 2025).
Persistent limitations include potentially slow convergence under loose surrogates, local optima in nonconvex landscapes, and the need for bespoke surrogate construction in complex models. Current research explores acceleration (Anderson, quasi-Newton), surrogate adaptivity, streaming MM, and theoretical properties in non-Euclidean geometries and manifold optimization.
In sum, MM is a cornerstone of modern large-scale statistical optimization, with deep links to convex analysis, algorithmic monotonicity, and the broad theory of surrogate-based iterative methods (Wu et al., 2011, Nguyen, 2016, Sahu et al., 2020, Zhang et al., 2023, Babu et al., 2023).