BSUM: Block Successive Upper Bound Minimization

Updated 21 March 2026

BSUM is a framework for structured nonconvex and nonsmooth optimization that uses surrogate functions to majorize the objective function at each iteration.
It employs block decomposition with tailored surrogates, ensuring convergence properties and scalable performance, with rates up to O(1/k^2) in specialized cases.
BSUM is applied across domains such as signal processing, wireless communications, tensor completion, and machine learning, offering robust, efficient solutions for large-scale problems.

Block Successive Upper Bound Minimization (BSUM) is a powerful framework for solving structured nonconvex and nonsmooth optimization problems via block decomposition and sequential surrogate minimization. Over the past decade, BSUM has been foundational in convex and nonconvex optimization for large-scale, block-structured problems, especially in signal processing, control, wireless communications, tensor completion, and machine learning. The distinctive feature of BSUM is that at each step, only one (or a subset) of variable blocks is updated by minimizing a carefully constructed upper-bound (surrogate) of the original objective, which is tight at the current iterate. This section details the theoretical principles, algorithmic construction, convergence properties, notable special cases, and representative applications of the BSUM methodology.

1. Fundamental BSUM Principle and Surrogate Construction

Let $f : X_1 \times \cdots \times X_n \to \mathbb{R}$ be a (potentially) nonconvex, nonsmooth objective, block-partitioned as $x = (x_1, \ldots, x_n) \in X_1 \times \cdots \times X_n$ , where each $X_i$ is closed and convex. The BSUM principle requires, at iteration $k$ , the construction of an upper-bound surrogate $u_i(x_i; x^{k})$ for each block $i$ with the following properties (Hong et al., 2015, Razaviyayn et al., 2012):

Exactness at base point: $u_i(x_i^k; x^k) = f(x^k)$ .
Majorization: $u_i(x_i; x^k) \geq f(x_1^k, \ldots, x_{i-1}^k, x_i, x_{i+1}^k, \ldots, x_n^k)$ for all $x_i \in X_i$ .
First-order agreement: Partial gradients match at $x_i = x_i^k$ .
Continuity: $u_i(x_i; x^k)$ is continuous in $(x_i, x^k)$ .

A commonly used surrogate is the proximal quadratic majorizer:

$u_i(x_i; x^k) = f(x_1^k, \ldots, x_{i-1}^k, x_i, x_{i+1}^k, \ldots) + \tfrac{\gamma}{2} \|x_i - x_i^k\|^2,$

where $\gamma > 0$ ensures strong convexity. For nonconvex or non-smooth $f$ , additional first-order or tangent-plane-based quadratic or linear surrogates are employed, preserving the majorization property (Hong et al., 2015, Wu et al., 2024, Hong et al., 2013).

2. General Algorithmic Framework

The generic BSUM algorithm operates as follows (Hong et al., 2015, Razaviyayn et al., 2012):

Initialize: Choose $x^{(0)} \in X = X_1 \times \cdots \times X_n$ .
Iterative Block Update:
- At iteration $k$ , select block $i$ (cyclic, randomized, maximum-improvement, or Gauss–Southwell rules).
- Solve the surrogate subproblem:
$x_i^{k+1} = \arg\min_{x_i \in X_i} u_i(x_i; x^k).$

Keep $x_j^{k+1} = x_j^k$ for $j \neq i$ .

Repeat until convergence.

If $u_i$ is separable or if parallel hardware is available, multiple blocks can be updated concurrently, with theoretical justification established under mild conditions (Hong et al., 2015). In highly constrained or coupled scenarios, e.g., when handling linear matrix inequalities or augmented Lagrangian penalties, BSUM serves as the inner loop within outer (e.g., ALM or ADMM) methods (Wu et al., 2024, Hong et al., 2014).

3. Convergence Properties and Complexity

Under fairly broad assumptions—including continuity of $f$ , compactness or coercivity of feasible sets, proper surrogate construction, and unique minimizers for all but at most one block—BSUM is guaranteed to produce sequences $\{x^k\}$ whose limit points are stationary points (coordinatewise minima) of the original problem (Razaviyayn et al., 2012, Hong et al., 2015). For smooth convex $f$ with strongly convex surrogates, global $O(1/k)$ sublinear convergence rates are established, and with additional structure (strong convexity or two-block acceleration), $O(1/k^2)$ is attainable (Hong et al., 2013). In block-convex problems without strong convexity, sublinear rates still obtain via Nesterov-type analysis. For nonconvex or nonsmooth $f$ , only stationarity (not global optimality) is guaranteed due to the absence of global descent directions.

Table: Summary of Convergence Results

Setting	Rate / Guarantee	Reference
Convex, strongly convex surrogates	Linear ( $O(\log(1/\varepsilon))$ )	(Hong et al., 2013)
Convex, non-strongly convex surrogates	Sublinear ( $O(1/k)$ )	(Hong et al., 2013)
2-block, accelerated (convex)	$O(1/k^2)$	(Hong et al., 2013)
Nonconvex/nonsmooth	Stationary point subsequence	(Razaviyayn et al., 2012)
Inexact block solves, diminishing error	Asymptotic stationarity	(Wu et al., 2024)

4. Specializations and Connections to Other Methods

BSUM generalizes and unifies many classical methods:

Block Coordinate Descent (BCD): Plain blockwise minimization, $u_i(x_i; z) = f(x_1, ..., x_{i-1}, x_i, x_{i+1}, ..., x_n)$ (Hong et al., 2015).
Convex-Concave Procedure (CCCP): For $f(x) = g_1(x) - g_2(x)$ , use a linearized concave part for the surrogate (Hong et al., 2015).
Expectation Maximization (EM): Surrogates via Jensen's inequality or local majorization of $\ell(\theta)$ (Hong et al., 2015, Razaviyayn et al., 2012).
Proximal/Iterative Reweighted Methods: IRLS-type surrogates fit into the same framework (Hong et al., 2015).
WMMSE in Wireless Networks: Block-wise quadratic surrogates give rise to WMMSE iterations as a special BSUM case (Hong et al., 2015, Li et al., 2014).

Augmented Lagrangian and penalty methods exploit BSUM as an inner loop (cf. "inexact ALM"), especially for nonconvex, block-structured, constrained design problems such as constant-envelope massive MIMO waveform optimization (Wu et al., 2024).

5. BSUM in Structured Optimization: Notable Application Domains

BSUM has become the algorithmic backbone in a variety of high-impact optimization problems:

Dual-functional Radar-Communications: Block-wise upper-bound minimization for constrained, nonconvex waveform design under QCE constraints enables efficient, closed-form updates of each block and leads to ALM-BSUM hybrid methods with strong theoretical guarantees (Wu et al., 2024).
Physical-Layer Security and IRS/Fluid Antenna Beamforming: Complex fractional and manifold-constrained designs are solved by iteratively linearizing/majorizing objective ratios, decomposing variables into blocks (e.g., beamformer, phase shifts), and using blockwise surrogates with analytic updates (Xiong et al., 19 Nov 2025, Li et al., 26 Feb 2026, Mao et al., 2024).
Resource Allocation in Edge Computing and Federated Learning: BSUM enables the decomposition of mixed-integer nonconvex joint designs into subproblems that are (after relaxation, if necessary) each tractable via QP or MIQP solvers, maintaining theoretical convergence and facilitating distributed realization (Ei et al., 2020, Khan et al., 2021).
Tensor and Matrix Factorization: Multimodal core tensor factorization and completion benefit from BSUM by proximal block alternation with singular value thresholding or similar low-rank surrogates, demonstrating convergence even with nonconvex log-penalty relaxations (Zeng, 2020).
Sparse LQG Control: Nonconvex, nonsmooth sparse feedback design is addressed via BSUM by alternately minimizing smooth folded-concave surrogates and solving structured LMI-constrained subproblems for each block (Feng et al., 2024).

6. Algorithmic Instantiations: Surrogates, Block Partitioning, and Implementation

The effectiveness and efficiency of BSUM strongly depend on the block partitioning strategy and the local surrogate selection:

Block Partitioning: Natural separability in variable structure or constraints (e.g., beamformers per user, antenna positions, or power/resource allocation per node) guides the choice of blocks (Hong et al., 2015, Wu et al., 2024, Khan et al., 2021).
Surrogate Design: Quadratic, linear, and proximal upper-bounds are tailored via first/second-order Taylor expansion, Lipschitz constants, and curvature estimates to ensure strong convexity and efficient solvability of block subproblems (Hong et al., 2014, Feng et al., 2024).
Closed-form Updates: Many BSUM applications exploit analytic minimizers of the surrogate subproblems—ranging from Cardano solution of cubic equations (Wu et al., 2024) to singular value shrinkage (Zeng, 2020), to elementwise projections onto discrete feasible sets (Raei et al., 2021).
Hybrid/Hierarchical Loops: BSUM is frequently embedded in outer penalty or dual-update schemes (e.g., ALM, penalty decomposition), where the inner BSUM loop solves penalized subproblems to $\varepsilon$ -approximate stationarity (Wu et al., 2024).

7. Empirical Performance, Scalability, and Impact

BSUM has demonstrated superior empirical performance in major large-scale applications:

Computational Efficiency: In distributed beamforming problems, BSUM-based algorithms (e.g., in coordinated outage-constrained CoBF) achieve several orders of magnitude speedup over classical SCA and polyblock-outer-approximation competitors (Li et al., 2014).
High Scalability: BSUM decomposes high-dimensional, tightly coupled design spaces into blockwise problems whose per-iteration cost scales with local block size, not the global dimension. This is critical in massive MIMO, IRS, and tensor completion (Wu et al., 2024, Xiong et al., 19 Nov 2025, Zeng, 2020).
Convergence Robustness: Monotonic descent, guaranteed stationarity under simple surrogates, and resistance to cycling in nonconvex or mixed-integer contexts are consistently demonstrated (Razaviyayn et al., 2012, Ei et al., 2020, Feng et al., 2024).
Flexibility and Generality: BSUM encompasses and extends a wide class of classical and modern block-coordinate, alternating, and expectation-maximization methods and can interface with penalty-based, manifold, and stochastic machine learning algorithms (Hong et al., 2015, Xiong et al., 19 Nov 2025, Zhang et al., 2 Feb 2026).

In summary, Block Successive Upper Bound Minimization is a versatile and theoretically principled framework for decomposing, majorizing, and efficiently solving large-scale block-structured optimization problems encountered in contemporary signal processing, wireless, control, and data science domains (Hong et al., 2015, Razaviyayn et al., 2012, Wu et al., 2024).