Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 156 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 58 tok/s Pro
Kimi K2 187 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 39 tok/s Pro
2000 character limit reached

Majorization-Minimization Algorithms

Updated 10 November 2025
  • Majorization-Minimization (MM) algorithms are iterative optimization methods that replace complex objectives with simpler surrogate functions to ensure guaranteed descent.
  • They construct surrogates that majorize the original objective, enabling efficient handling of nonconvex, nonsmooth, and large-scale problems.
  • Variants such as higher-order, stochastic, and variance-reduced MM offer convergence rates from sublinear to superlinear, broadening their applicability in modern optimization.

Majorization-Minimization (MM) algorithms are a broad family of iterative optimization schemes that replace a difficult objective function with a succession of easier surrogates—majorizers—that tightly bound the objective at the current iterate. By design, each MM step monotonically reduces the objective, and the method is applicable to a wide range of nonconvex, nonsmooth, constrained, or large-scale settings. MM algorithms are foundational to modern optimization for statistics, signal processing, and machine learning, with theoretical guarantees and rich algorithmic variants.

1. Core Principles and Formal Definition

The classical MM framework seeks to minimize a given function f(x)f(x) by generating a sequence {xk}\{x_k\} such that at each iteration:

  • A majorizing surrogate U(x;xk)U(x; x_k) is constructed satisfying:

    1. U(x;xk)f(x)U(x; x_k) \ge f(x) for all xx (global upper bound)
    2. U(xk;xk)=f(xk)U(x_k; x_k) = f(x_k) (tangency at current iterate)
  • The next iterate is obtained by:

xk+1=argminxU(x;xk)x_{k+1} = \arg\min_x U(x; x_k)

  • The descent property is immediate:

f(xk+1)U(xk+1;xk)U(xk;xk)=f(xk)f(x_{k+1}) \le U(x_{k+1}; x_k) \le U(x_k; x_k) = f(x_k)

Such a setup ensures monotonic decrease of f(xk)f(x_k) over iterations. The classical MM can be interpreted as a meta-algorithm—turning a hard minimization into a series of simpler subproblems, each easier due to the structure of U(x;xk)U(x; x_k).

2. Construction and Classes of Surrogates

2.1 Traditional Surrogate Construction

Surrogates are often created using convexity (Jensen's inequality), Taylor expansions with upper bounding of the remainder, EM-style conditional expectations, or quadratic upper bounds exploiting Lipschitz continuity of the Hessian.

Examples include:

  • Jensen-based bound for log-sum-exp in EM
  • Quadratic majorization of smooth terms: f(x)f(xk)+f(xk)T(xxk)+L2xxk2f(x) \le f(x_k) + \nabla f(x_k)^T (x - x_k) + \frac{L}{2}\|x - x_k\|^2 for LL-smooth ff
  • Linearization of nondifferentiable penalties, custom bounding for DC (difference of convex) problems

2.2 Higher-Order and Adaptive Majorization

Recent advances include higher-order MM where the surrogate matches the function and its derivatives up to order pp at xkx_k, and the error h(y;xk)=U(y;xk)f(y)h(y; x_k) = U(y; x_k) - f(y) is pp-times differentiable with Lipschitz ppth derivative. Implementation requires constructing UU such that h(y;xk)0h(y; x_k) \ge 0 for all yy, and h(xk;xk)=h(xk;xk)==ph(xk;xk)=0h(x_k; x_k) = \nabla h(x_k; x_k) = \dots = \nabla^p h(x_k; x_k) = 0 (Necoara et al., 2020, Lupu et al., 2021).

Automatic surrogate generation has emerged, e.g., "Universal MM" algorithms leverage Taylor-mode automatic differentiation and interval bounding on Taylor remainders to construct tight surrogate upper bounds programmatically for arbitrary (smooth) ff (Streeter, 2023). This allows "black-box" MM for user-supplied objectives and eliminates the need for hand-designed surrogates.

3. Convergence Theory and Rate Results

Assuming surrogates satisfy the classical MM properties and the domain is closed and level sets are compact, the MM sequence:

  • Is guaranteed to be non-increasing
  • Has limit points that are stationary for nonconvex ff
  • For convex ff with strongly convex surrogates, achieves global linear convergence:

f(xk)fβk(f(x0)f)for some β(0,1)f(x_k) - f^* \le \beta^k (f(x_0)-f^*) \quad \text{for some } \beta \in (0, 1)

  • For ppth-order MM, global sublinear rates O(1/kp)O(1/k^p) in convex settings and local superlinear rates under uniform convexity are established (Necoara et al., 2020, Lupu et al., 2021)
  • For objectives with the Kurdyka–Łojasiewicz property, precise local rates ranging from sublinear, linear, to superlinear can be shown, depending on the exponent in the KL-inequality (Necoara et al., 2020, Lupu et al., 2021)

Stochastic variants, including stochastic higher-order MM and stochastic majorization-minimization (SMM), feature sublinear O(1/n)O(1/\sqrt{n}) rates for convex problems, O(1/n)O(1/n) for strongly convex, and almost-sure convergence to stationary points for broad nonconvex problems (Mairal, 2013, Lupu et al., 2021).

Variance-reduced MM algorithms (incorporating SAGA, SVRG, SARAH estimators) further reduce gradient sample complexity, with optimal rates O~(n1/2/ϵ2)\tilde{O}(n^{1/2}/\epsilon^2) for nonconvex finite-sum composite problems (Phan et al., 2023).

4. Algorithmic Variants and Extensions

Variant Distinguishing Feature Typical Application/Advantage
Universal MM (Streeter, 2023) Automatic, derivative-based bounds Black-box optimization without tuning
Incremental MM (MISO) (Mairal, 2014) Surrogates updated per sample Large-scale sum-structure; linear rates
Stochastic MM (Mairal, 2013) Surrogate per minibatch or sample Online/streaming, memory and compute tractable
Variance-reduced MM (Phan et al., 2023) SVRG/SAGA in MM subproblems Best-known gradient complexities for finite-sums
Higher-order MM (Necoara et al., 2020Lupu et al., 2021) ppth-order Taylor bounding Superlinear local convergence under regularity
Bregman MM (Martin et al., 13 Jan 2025) Adaptive, potentially non-Euclidean surrogates Accelerated convergence for composite objectives
MM for matrix means (Zhang, 2013) Manifold optimization; closed-form updates Riemannian means in SPD geometry
Min-max MM (MM4MM) (Saini et al., 12 Nov 2024) Surrogate for min-max reformulations Nonconvex-constrained signal-processing
Generalized MM (G-MM) (Parizi et al., 2015) Relaxes "touching" to "progress" Robustness to initialization, application-specific bias

Algorithmic building blocks

  • SafeRate and SafeCombination (Universal MM): Hyperparameter-free, uses local Taylor majorizers, can adapt step-size over arbitrary smooth ff (Streeter, 2023)
  • MISO: Incremental update of per-sample surrogates, global surrogate minimized per iteration; achieves accelerated convergence for large T (Mairal, 2014)
  • Stochastic Proximal MM: Running surrogate is a weighted sum of per-sample surrogates, updated at every stochastic sample. Step-size schedule critical for rate (Mairal, 2013)
  • Variance reduction: MM-SAGA/SVRG/SARAH—MM subproblems solved with variance-reduced first-order estimators (Phan et al., 2023)

5. Practical Applications and Empirical Results

MM algorithms underpin a wide range of applications, including, but not limited to:

  • Gaussian Mixture Regression, Multinomial Logistic Regression, and SVM: MM algorithms can build quadratic or EM-style surrogates yielding closed-form or fast iterative updates (Nguyen, 2016Nguyen et al., 2017)
  • Nonnegative binary matrix factorization: MM with Jensen-type surrogates yields closed-form update rules competitive with logistic-PCA and interpretable factorization models (Magron et al., 2022)
  • Bilevel hyperparameter optimization: MM on duality-refomulated single-level problems enables efficient conic-program subproblem solutions for otherwise intractable bilevel programs (Chen et al., 1 Mar 2024)
  • Signal processing min-max problems: MM4MM leverages dual representations and majorized min-max alternation to enable hyperparameter-free, provably monotonic algorithms in phase retrieval, beamforming, sensor placement, etc. (Saini et al., 12 Nov 2024)
  • High-dimensional regression with nonsmooth/nonconvex penalties: MM with iterated soft-thresholding or semismooth Newton subproblems achieves fast, scalable regression with theoretical guarantees for support recovery and convergence (Schifano et al., 2010, Tang et al., 2019)
  • Deep neural network optimization: Universal MM methods can be applied layerwise to ensure safe, monotonic descent even under extreme overparameterization (Streeter, 2023)
  • Dirichlet maximum-likelihood: Variable Bregman MM accelerates parameter estimation compared to Newton-type and fixed-metric methods (Martin et al., 13 Jan 2025)

6. Limitations, Challenges, and Current Directions

Despite their generality, MM algorithms face important operational and theoretical considerations:

  • Surrogate construction is nontrivial for non-smooth, high-dimensional, or non-Euclidean objectives. Recent work on automatic differentiable majorizer construction and variable-metric methods has broadened their applicability (Streeter, 2023Martin et al., 13 Jan 2025)
  • The efficiency of MM steps depends critically on the tractability of the surrogate subproblems. For some models, these may require bespoke solvers or reformulations (e.g., conic programming for bilevel optimization (Chen et al., 1 Mar 2024), semismooth Newton for nonconvex regression (Tang et al., 2019)).
  • MM sequence convergence may be slow under ill-conditioned or tight surrogate settings; acceleration methods such as quasi-Newton acceleration (e.g., SQUAREM), adaptive metric methods, variance-reduction, and higher-order MM address this, but often require problem-specific tuning (1001.47762305.06848).
  • Classical MM's requirement that surrogates touch the objective at the current iterate can be unnecessarily restrictive in nonconvex and latent-variable settings; the Generalized MM (G-MM) framework relaxes this via a "progress" requirement, enabling more robust optimization (Parizi et al., 2015).
  • In stochastic and large-scale regimes, memory and communication constraints have led to the development of incremental, online, and variance-reduced MM variants; optimal choices of weights and batch sizes remain an active area of research (1306.46501402.4419Phan et al., 2023).
  • Extensions to non-Euclidean geometries, non-Lipschitz problems, or those with more complex constraint sets are ongoing areas of method development (Martin et al., 13 Jan 2025Saini et al., 12 Nov 2024).

7. Summary Table: Key MM Algorithm Variants

MM Variant / Context Core Innovation Notable Features
Universal MM (Streeter, 2023) Automatic Taylor-remainder surrogates Hyperparameter-free, black-box
Higher-Order MM (Necoara et al., 2020Lupu et al., 2021) ppth-order surrogates, fast local rates Superlinear convergence, adaptive
Variance-Reduced MM (Phan et al., 2023) SAGA/SVRG/SARAH in MM subproblems Optimal sample complexity
Bregman MM (Martin et al., 13 Jan 2025) Variable, adaptive Bregman majorizers Accelerated convergence
MISO/Incremental MM (Mairal, 2014) Per-sample surrogates, updated incrementally Linear rates for convex
Generalized MM (G-MM) (Parizi et al., 2015) Progress in place of touching constraint Exploratory, less initialization-sensitive
MM for bilevel programs (Chen et al., 1 Mar 2024) Surrogates for dual-based reformulations Efficient conic subproblems
MM4MM min-max (Saini et al., 12 Nov 2024) Surrogates on min-max reformulations Monotonic, hyperparameter-free
Nonneg. binary NMF (Magron et al., 2022) Closed-form Jensen surrogates Interpretable factors, simple updates

References

Key references for further paper include:

The MM paradigm provides a versatile and well-founded approach for tackling the full complexity of contemporary machine learning, optimization, and statistical estimation problems—both theoretically and at scale.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Majorization-Minimization Algorithm.