Bi-Level Minimization Approach

Updated 8 December 2025

Bi-Level Minimization Approach is a hierarchical method where the upper-level problem is constrained by optimal solutions from the lower-level.
It leverages specialized algorithms such as dynamic proximal gradient, dual-based bisection, and projection-free methods for robust convergence.
The approach employs dynamic regularization and surrogate relaxations to handle non-unique lower-level solutions and improve computational scalability.

A bi-level minimization approach addresses hierarchical optimization problems involving two nested levels of optimization: an upper (outer) level and a lower (inner) level. The solution to the upper-level (leader) problem is constrained by the optimal solution set of the lower-level (follower) problem. Bi-level programs capture essential structures in machine learning (e.g., hyperparameter optimization, meta-learning), operations research, and engineering design, but present substantial algorithmic and theoretical challenges due to the implicit and often non-unique dependence of the upper-level’s feasible set on lower-level optima. Recent research has developed specialized algorithms, complexity analyses, and reformulations for broad classes of bi-level minimization, particularly focusing on convex and composite settings, non-differentiability, and computational scalability.

1. Foundational Bi-Level Optimization Problem Classes

Bi-level minimization generally takes the form

$\min_{x \in \mathcal{X}}\, f_{\text{UL}}(x, y^*(x)), \qquad \text{where}\quad y^*(x) \in \arg\min_{y \in \mathcal{Y}} f_{\text{LL}}(x, y)$

where $f_{\text{UL}}$ denotes the upper-level objective, and $f_{\text{LL}}$ denotes the lower-level objective. Standard classes include:

Simple Bi-Level Problem: $\min_{x \in \mathbb{R}^n} f(x)$ s.t. $x \in \arg\min_{z \in \mathbb{R}^n} g(z)$ . Here, both levels are single-variable programs, frequently convex and composite (Jiang et al., 13 Sep 2024, Sabach et al., 2017).
Composite Convex Bi-Level Optimization: Both $f$ and $g$ decompose into smooth (gradient-Lipschitz) and potentially nonsmooth (proper convex) components, with efficient proximal mappings available (Merchav et al., 30 Jul 2024, Sabach et al., 2017, Giang-Tran et al., 2023).
General Bi-Level with Constraints: The lower-level is a parametric convex optimization with additional constraints and possibly nonsmooth regularizers, often requiring complex dual and penalty-based reformulations (Lu et al., 10 Nov 2025, Sow et al., 2022).

Problem data can result in solution sets $S(x) = \arg\min_{y} f_{\text{LL}}(x, y)$ that are non-singleton, leading to challenges in “minima selection” for pessimistic/robust variants (Masiha et al., 9 May 2025).

2. Algorithmic Foundations and Modern Schemes

Recent advances have produced a wide range of first-order and projection-free algorithms for bi-level minimization, addressing key computational bottlenecks.

Dynamic Proximal Gradient and FISTA-Type Methods

FBi-PG (Merchav et al., 30 Jul 2024) applies accelerated proximal gradient (FISTA) to a sequence of Tikhonov-regularized objectives,

$F_k(x) = \varphi(x) + \alpha_k \omega(x),\quad \alpha_k = (k+a)^{-\gamma}$

where $\varphi$ (inner) and $\omega$ (outer) are composite convex. The method achieves $O(1/k^2)$ convergence for the inner level (if $\gamma>2$ ) and sublinear rates for the outer level, under minimal assumptions.

Root-Finding and Dual-Based Bisection

For convex “simple” settings, root-finding reformulations reduce the bi-level program to searching for the left-most root $c^*$ of a value function $h(c) = \min_{x: f(x) \leq c} g(x) - g^*$ using bisection. Each level-set constrained subproblem is then solved via dual accelerated proximal gradient, yielding $\widetilde{O}(\sqrt{(L_{g}+2D_zL_{f}+1)/\epsilon} \log^3(1/\epsilon))$ complexity (Jiang et al., 13 Sep 2024).

Proximal Alternating Minimization for Nonconvex/Composite

In feature selection and clustering, bi-level problems are solved via Proximal Alternating Minimization (PAM), iterating closed-form or efficiently solvable updates for projections, graph assignment, and projection matrix optimization, converging to critical points under general Kurdyka–Łojasiewicz prerequisites (Liu et al., 26 May 2025).

Projection-Free Frank–Wolfe Variants

Projection-free approaches update with only a linear oracle over the base feasible set, replacing projection with Frank–Wolfe steps on a dynamically regularized objective (e.g., $\Phi_t(x) = \sigma_t f(x) + g(x)$ ), and achieve $O(t^{-1/2})$ rates, up to $O(t^{-1})$ with quadratic growth or strong convexity (Giang-Tran et al., 2023).

Algorithm Comparison

Algorithm Class	Main Scheme	Complexity (rates)	Key Structural Assumptions
FBi-PG (Merchav et al., 30 Jul 2024)	Accelerated proximal gradient	$O(1/k^2)$ (inner), sublinear (outer), $O(1/k)$ joint rates	Composite convex, Lipschitz grad, prox
BiVFA (Jiang et al., 13 Sep 2024)	Bisection + Dual APG	$\widetilde{O}(\sqrt{L/\epsilon}\log^3(1/\epsilon))$	Composite convex, prox
IR-CG (Giang-Tran et al., 2023)	Linear oracle, Frank-Wolfe	$O(t^{-1/2})$ , $O(t^{-1})$ accel.	Compact, convex, smooth
BLUFS (Liu et al., 26 May 2025)	Proximal Alternating Min.	Critical-point convergence	Nonconvex, manifold-structured constraints
SBP-LFS (Dempe et al., 16 Apr 2025)	Projected-gradient + bktr.	Cluster-point convergence	Convex, $g$ smooth/not Lipschitz

3. Regularization, Relaxations, and Handling Non-Uniqueness

Dynamic Tikhonov regularization addresses unknown problem-dependent penalization weights by allowing regularization parameters to decay, thus ensuring convergence to actual solutions of the original bi-level program rather than of a fixed-penalized surrogate (Merchav et al., 30 Jul 2024).

When the lower-level admits multiple minimizers, as in over-parameterized deep learning or non-strongly convex objectives, bi-level minima-selection must be relaxed. The “Superquantile-Gibbs relaxation” (Masiha et al., 9 May 2025) constructs a smooth surrogate for the non-smooth upper-level (hyper-objective) using a combination of Gibbs sampling and CVaR-type (superquantile) approximation, achieving pointwise $\epsilon_v$ -accuracy and polynomial-intrinsic-dimension complexity for the lower-level manifold of minimizers.

4. Theoretical Guarantees and Complexity

Recent results provide matching lower and upper bounds for the oracle complexity of bi-level minimization in standard regimes:

In composite convex settings, $\widetilde{O}(\sqrt{L/\epsilon})$ , where the scaling matches first-order optimality for unconstrained objectives (Jiang et al., 13 Sep 2024).
For Lipschitz continuous $f,g$ , $\widetilde{O}(1/\epsilon^2)$ lower bounds, matched by functionally constrained bisection-based methods (Zhang et al., 10 Sep 2024).
Strong convexity or quadratic growth further accelerates convergence rates, reducing complexity to $O(1/\epsilon)$ or better (Giang-Tran et al., 2023, Sabach et al., 2017).
With non-unique lower-level minima, complexity scales polynomially in the intrinsic dimension $k$ of the lower-level solution manifold ( $\mathrm{poly}(\epsilon_v^{-k}, \epsilon_g^{-1})$ ) (Masiha et al., 9 May 2025).
Sequential minimax approaches achieve $O(\epsilon^{-7}\log(1/\epsilon))$ for convex lower levels and $O(\epsilon^{-6}\log(1/\epsilon))$ for strongly convex, improving over previous penalty methods (Lu et al., 10 Nov 2025).

These analyses are under weak regularity, such as mere convexity/proximability of both levels, and Lipschitz gradients when available. Many bi-level methods are Hessian-free and fully first-order.

5. Application Domains and Empirical Results

Bi-level minimization underpins diverse large-scale applications:

Unsupervised Feature Selection: BLUFS achieves state-of-the-art clustering and classification via joint spectral clustering and $\ell_{2,0}$ -norm constrained projection learning (Liu et al., 26 May 2025).
LLM Unlearning: BLUR reformulates the unlearning problem as a bi-level hierarchy (forget/retain tasks) and empirically dominates all weighted-sum baselines in both forget/retain utility tradeoff (Reisizadeh et al., 9 Jun 2025).
Energy Networks: Bi-level MILP (via KKT-MPEC linearization) yields globally optimal energy storage sharing among network agents, with numerically verified cost and peak-load reduction (Chen et al., 2017).
Meta-Learning, Hyperparameter Optimization, and Robust Learning: Bi-level methods are empirically demonstrated to converge to better solutions and faster than classical methods (Merchav et al., 30 Jul 2024, Sow et al., 2022, Liu et al., 2021).

6. Extensions: Nonconvex, Nonsmooth, and Constraint-Rich Regimes

Works on Bregman-proximal wrappers (Ochs et al., 2016), sequential minimax reformulations (Lu et al., 10 Nov 2025), and value-function-based approximations (Liu et al., 2021) extend bi-level minimization to settings with nonconvex, nonsmooth, or functional constraints, often retaining convergence guarantees. These approaches decouple the need for Hessian inverses or full backpropagation through the lower-level solver, increasing computational scalability for high-dimensional problems.

7. Open Challenges and Methodological Innovations

Despite sharp complexity characterizations and diverse practical successes, major methodological challenges persist:

Construction of bi-level surrogates that capture the true manifold of lower-level minimizers without structural conditions (e.g., global PL, strict complementarity).
Fast and provably convergent algorithms in the presence of additional upper- or lower-level constraints, with real-world data and noise.
Robustness to nonconvexity, non-smoothness, and inexact oracles, especially as bi-level formulations proliferate in large-scale multi-agent and unsupervised learning contexts.

Recent advances—including dynamic parameter tuning, projection-free oracles, and superquantile-based minima selection—signal a robust and rapidly evolving literature. These methodological developments have already demonstrated practical impact in machine learning, signal processing, energy systems, and beyond (Merchav et al., 30 Jul 2024, Liu et al., 26 May 2025, Liu et al., 2021, Reisizadeh et al., 9 Jun 2025, Lu et al., 10 Nov 2025, Chen et al., 2017).