Adaptive FISTA: Advanced Optimization Variants

Updated 5 June 2026

Adaptive FISTA is a framework of optimization algorithms that dynamically adjusts factors like step-size, momentum, and restart schedules based on local curvature and statistical properties.
It employs adaptive momentum, parameter-free backtracking, and restart mechanisms to address oscillations, ill-conditioning, and nonconvex challenges in various applications.
These methods improve practical convergence and efficiency in large-scale, non-Euclidean, and data-driven problems while closely approximating FISTA’s optimal worst‐case rates.

Adaptive FISTA encompasses a family of Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) variants in which one or more algorithmic parameters—step-size, momentum, regularization, prox metric, or restart schedule—are dynamically adjusted based on local information or statistical properties, rather than fixed a priori. These adaptations aim to improve empirical convergence, robustness to ill-conditioning, applicability in non-convex or non-Euclidean settings, and sample efficiency, while often retaining (or closely approximating) FISTA’s optimal worst-case theoretical rate.

1. Foundations: FISTA and the Need for Adaptivity

FISTA solves composite minimization problems of the form

$\min_{x \in \mathbb{R}^n} \quad f(x) := \Psi(x) + h(x)$

where $\Psi$ is convex (possibly nonsmooth), $h$ is convex and differentiable with $\nabla h$ Lipschitz. The classic form uses fixed parameters—step-size $1/L$, Nesterov-style momentum with the canonical $t$ -recursion, and a prescribed number of iterations or tolerance.

Despite its $O(1/k^2)$ optimal convergence for convex problems, classic FISTA exhibits several practical limitations:

Sensitivity to local curvature and the global Lipschitz constant $L$
Oscillatory trajectories and potential lack of sequence convergence
Inability to exploit strong convexity or adapt to varying local smoothness
Lack of mechanisms for efficient handling of nonconvexity, discrete adaptive discretizations, or data-driven structure

This motivates adaptive FISTA schemes that dynamically tune parameters using observed progress, local surrogate models, or problem-specific side information (Liang et al., 2018, Alamo et al., 2019, Ochs et al., 2017).

2. Adaptive Acceleration and Restart Mechanisms

Adaptive FISTA variants deploy several strategies for parameter and schedule adaptation:

Adaptive Momentum and “Lazy Start”:

FISTA-Mod introduces $(p,q,r)$ as free parameters into the Nesterov $t$ -update. Smaller $\Psi$ 0 “slows” the approach of $\Psi$ 1 to 1, suppressing oscillations and improving convergence in practice. This lazy-start approach can accelerate convergence by an order of magnitude over classical settings for “pathological” problems (Liang et al., 2018).

Adaptive Restart via Function/Momentum Criteria:

Restarts can suppress detrimental oscillations. The LCR-FISTA (“Linearly Convergent Restart FISTA”) variant introduces a globally linearly convergent restart rule for composite convex problems with quadratic functional growth (QFG): FISTA is run in repeated inner loops, each terminated when a local functional decrease criterion is met, with loop length adaptively doubled if geometric decrease is not observed. No knowledge of $\Psi$ 2 or the QFG parameter $\Psi$ 3 is required; the resulting convergence rate is $\Psi$ 4 in outer-loop count (Alamo et al., 2019).

Parameter-Free Backtracking and Online Conditioning Estimation:

Free-FISTA couples adaptive backtracking for $\Psi$ 5, non-monotone step size increases/decreases, and a restart schedule that computes online estimates for $\Psi$ 6 via functional decreases. When QFG holds, the method achieves an accelerated linear rate $\Psi$ 7 in function value, without a priori knowledge of any problem constant (Aujol et al., 2023).

Gradient and Subspace Adaptivity:

FISTA variants—such as those with spatially adaptive discretizations for Banach-space LASSO or wavelet/tomographic recovery—adaptively refine the computational basis in which the proximal mapping (or the whole iterate) is computed, ensuring that the number of degrees of freedom increases only as necessary to achieve a prescribed accuracy (Chambolle et al., 2021).

Adaptive Feature	Example Method	Key Reference
Adaptive momentum	FISTA-Mod/Lazy-Start	(Liang et al., 2018)
Adaptive restart	LCR-FISTA, Free-FISTA	(Alamo et al., 2019, Aujol et al., 2023)
Adaptive step-size	Backtracking FISTA	(Aujol et al., 2023)
Average curvature	AC-FISTA	(Liang et al., 2021)
Discretization adapts	Banach-space FISTA	(Chambolle et al., 2021)
Regional/learned adapt.	RDFNet (“regional FISTA”)	(Zhou et al., 2023)

3. Step-Size and Curvature Adaptation

Adaptive FISTA methods often avoid fixed global step-sizes:

Backtracking: Step sizes are shrunk/aggressively increased based on model fit at each iteration. Non-monotone backtracking FISTA increases the step when the local quadratic model is accurate and decreases otherwise (Aujol et al., 2023, Nguyen et al., 2024, Rebegoldi et al., 2021, Calatroni et al., 2021).
Average Curvature Tracking: AC-FISTA dispenses with explicit line search, maintaining a moving average of “model” local upper curvatures (computed from observed nonlinearity), which is used for both step-size and model-building (Liang et al., 2021). This yields step-size adaptation without backtracking overhead and rates commensurate with the best known for smooth composite acceleration.
Strong Convexity Adaptation: When strong convexity (or QFG) is detected or assumed, the momentum parameter is adjusted adaptively; LCR-FISTA and Free-FISTA both adapt the inner loop or momentum by measuring realized functional contractions and without explicit knowledge of $\Psi$ 8 (Alamo et al., 2019, Aujol et al., 2023).
Nonconvex Generalization: VAR-FISTA and aFISTA adaptively regularize subproblems based on local negative curvature, switching between (accelerated) FISTA and more robust nonconvex variants on-the-fly, with rates that interpolate between $\Psi$ 9 (convex) and $h$ 0 or worse (general nonconvex) (Sim, 2020, Ochs et al., 2017).

4. Adaptivity Beyond the Algorithmic Core: Metrics, Discretization, and Learning

Variable Metric and Inexact Proximal Mapping: SAGE-FISTA and its convex variant S-FISTA generalize FISTA to variable-metric proximal steps (using pre-conditioning or split-gradient adaptation), significant for imaging problems and ill-conditioning (Rebegoldi et al., 2021, Calatroni et al., 2021). The metric $h$ 1 is chosen dynamically, and both the forward and proximal steps may be computed inexactly, with controlled tolerance decay, while adaptive backtracking governs step-size.
Adaptive Subspace Selection: For problems where minimizers are not in the native Hilbert space but in a Banach space (e.g., $h$ 2 or sparse measures), FISTA can be applied on a sequence of adaptively refined subspaces. Under verifiable energy approximation rates, convergence of order $h$ 3 (where $h$ 4 quantifies solution structure) is achieved (Chambolle et al., 2021).
Data-Driven and Regionwise Adaptivity: Unfolded deep networks with FISTA-style blocks (FISTA-Net, RDFNet) implement adaptivity via learnable, phase-dependent step-sizes, thresholds, momentum, and even region-dependent transformation domains. For example, RDFNet partitions the feature maps by region, provides learnable regional transforms, and pixelwise adaptive soft-thresholding, dramatically outperforming fixed global FISTA or FISTA-Net on spectral snapshot compressive imaging (Zhou et al., 2023).

5. Theory: Guarantees and Complexity of Adaptive FISTA

Convergence Rates (Convex/Strongly Convex/QFG):
- Classic FISTA: $h$ 5 decay for function value gap.
- Adaptive FISTA with QFG: Linear convergence $h$ 6 outer-restarts for LCR-FISTA (Alamo et al., 2019); $h$ 7 for Free-FISTA under quadratic functional growth [(Aujol et al., 2023)