Martingale-Based Unsmooth Optimization

Updated 27 July 2025

The paper introduces a novel martingale-based unsmooth objective framework that integrates non-differentiable regularization with stochastic optimization techniques.
It employs advanced smoothing approximations and proximal methods to handle non-smooth terms while guaranteeing convergence under established regularity conditions.
The approach demonstrates practical efficiency in diverse applications such as robust regression, econometrics, and image recovery through scalable algorithmic strategies.

A martingale-based unsmooth objective function refers to an optimization objective that leverages martingale structures—typically in the context of stochastic processes or stochastic optimization—where the objective is either inherently nonsmooth (e.g., incorporates non-differentiable or max-type terms) or receives non-differentiable regularization. The combination of martingale concepts with unsmooth objective functions has significant theoretical and algorithmic implications across statistics, machine learning, mathematical finance, harmonic analysis, and stochastic control. This article provides a comprehensive treatment of this emerging theme, detailing the mathematical frameworks, algorithmic methodologies, convergence properties, and canonical applications.

1. Mathematical Formulation of Martingale-Based Unsmooth Objectives

The canonical mathematical framework often considered in this context is composite convex or nonconvex optimization: $\varphi(x) = f(x) + h(x)$ where $f(x)$ is a (possibly expectation-valued) smooth function and $h(x)$ is a (possibly non-differentiable) convex (or even nonconvex) function. When $f(x)$ is represented as an expectation with respect to a random variable (e.g., $f(x) = \mathbb{E}[F(x, \xi)]$ ), algorithmic approaches naturally give rise to stochastic processes that satisfy the martingale property.

The non-smoothness typically enters through $h(x)$ , which may be an $\ell_1$ -norm, indicator function of a constraint set, structured sparsity regularizer, or—in many probabilistic settings—the negative log-pdf (likelihood) or risk measures. When the stochastic gradients are unbiased, the stochastic process defined by the parameter sequence exhibits martingale-like properties, which are explicitly leveraged in modern algorithmic and probabilistic analysis.

An alternative example, prevalent in time series modeling and hypothesis testing, utilizes the martingale difference divergence (MDD) to define objectives that possess non-smooth features (e.g., based on L2 norms of Fourier coefficients or distance covariance), see (Rolla, 2023, Song et al., 17 Apr 2024).

2. Algorithmic Strategies: Smoothing and Proximal Methods

A principal challenge posed by martingale-based unsmooth objectives is the lack of differentiability of $h(x)$ . Multiple algorithmic approaches address this:

Smoothing Approximations: When $h(x)$ has a structure expressible as $h(x) = \max_{v \in Q} v^{T}Ax$ for compact $Q$ , a Nesterov smoothing technique may be used, replacing $h(x)$ with a differentiable approximation $h_{\mu}(x) = \max_{v \in Q} \{ v^T A x - \mu d(v) \}$ , where $d(v)$ is strongly convex. This renders the overall objective smooth, with the gradient and Lipschitz constant explicitly characterized (see (1008.5204)).
Proximal and Forward–Backward Schemes: For $f(x) + h(x)$ , forward–backward splitting or inertial algorithms (including memory terms) are used to accommodate nonconvex and nonsmooth cases. Each iteration combines explicit gradient steps on the smooth part with implicit or proximal steps on $h(x)$ . Notably, dynamical system formulations and variable metric/generalized proximal maps extend these approaches to broader classes of nonconvex objectives (Bot et al., 2014, Bot et al., 2015, Repetti et al., 2019).
Gradient Sampling for Subsmooth Functions: For pointwise maxima of smooth families, gradient sampling approximates the Clarke generalized gradient at a (potentially non-differentiable) point by randomly sampling gradients at nearby points (Boskos et al., 20 Mar 2025). This method is constructive, requiring only almost-everywhere differentiability, and does not rely on continuous differentiability on a full-measure open set.
Majorize–Minimize (MM) and Quadratic Surrogates: In streaming and online inference (censored quantile regression), the historically accumulated, non-differentiable loss is replaced by a quadratic surrogate, derived from a local second-order expansion, thus rendering a closed-form, majorized update (see (Deng et al., 21 Jul 2025)). This manages the computational burden while preserving statistical efficiency.

3. Convergence Analysis and Regularity Conditions

The convergence theory for algorithms addressing martingale-based unsmooth objectives rests on carefully constructed regularity assumptions:

Kurdyka–Łojasiewicz Inequality: Many analyses rely on the KL property of the objective or a regularized surrogate. This property quantifies the geometry near critical points and provides convergence guarantees even in nonconvex, nonsmooth settings (Bot et al., 2014, Bot et al., 2015, Repetti et al., 2019).
Composite Structure and Lipschitz Property: For algorithms based on smoothing, the composite structure allows quantification of gradient Lipschitz constants. For instance, for $h_{\mu}(x)$ constructed as above, the overall gradient’s Lipschitz constant is $L_{\mu} = L + \|A\|^2/(c\mu)$ (1008.5204).
Finite Length Property: A recurring technical tool is showing the total length $\sum_k \|x^{k+1} - x^k\|$ is finite, implying convergence to stationary points (Bot et al., 2014).
Stochastic Process Techniques: When stochastic gradients are used, martingale difference sequence properties underpin high probability convergence and allow the use of concentration inequalities (e.g., Hoeffding-Azuma lemma in exchangeability testing, (Dai et al., 2018)).

4. Canonical Applications

Martingale-based unsmooth objectives are central to several applied domains:

Application Area	Typical Objective Example	Relevant References
Regularized Regression (Lasso, Group)	$\mathbb{E}[L(y, Xx)] + \lambda \\|x\\|_1$ or structured norms	(1008.5204, Repetti et al., 2019)
Policy Evaluation in RL	Minimize martingale loss (quadratic error on observed values)	(Jia et al., 2021)
Testing/Econometric Moment Estimation	Integrals involving MDD (L2 norms over complex exponentials)	(Song et al., 17 Apr 2024, Rolla, 2023)
Change-Point Detection/Exchangeability	Additive martingale construction with bounded increments	(Dai et al., 2018)
Online Survival and Quantile Regression	Martingale-based estimating equations majorized by local quadratics	(Deng et al., 21 Jul 2025)
Image Recovery/Inverse Problems	Nonconvex composite objectives with wavelet/L0 penalties	(Bot et al., 2014, Repetti et al., 2019)

Such objectives routinely arise wherever composite or regularized learning is performed with large, high-dimensional, or streaming datasets, and wherever it is essential to design algorithms resilient to the lack of smoothness in either the penalty or the data fit terms.

5. Numerical Performance and Scalability

Empirical evidence across the literature demonstrates that correctly constructed martingale-based unsmooth optimization algorithms achieve expected theoretical rates under standard stochastic and convexity assumptions:

For convex stochastic problems, smoothing-based SGD achieves $O(1/\sqrt{N})$ convergence rates with appropriate parameter tuning (1008.5204).
In proximal and forward–backward settings, practical performance in large-scale image deblurring and regression with complex regularizers shows significant improvement—both in iteration numbers and final loss values—when the nonsmooth component is handled via smoothing or majorization (Repetti et al., 2019, Bot et al., 2014).
In online and streaming settings, quadratic surrogates for martingale-based estimating equations enable dramatically reduced storage and computational time while maintaining the statistical efficiency of oracle estimators (Deng et al., 21 Jul 2025).

6. Comparison with Alternative and Classical Approaches

Martingale-based handling of unsmooth objective functions is distinct from classic deterministic or smooth optimization in several key ways:

Regularization via Martingale Structure: Martingales are used not only to control stochastic fluctuations but also to create “benchmarked” or orthogonalized objectives (e.g., in dynamic portfolio optimization, (1209.4449)).
Probabilistic and Function Space Techniques: Techniques such as BeLLMan function construction, martingale orthogonality, and additive martingale methods provide non-variational, function-space-based approaches that are well-adapted to the endpoint and fractal features of the objectives (Reznikov et al., 2013, Dai et al., 2018).
Gradient Sampling vs. Martingale Stochastic Approximation: Both leverage randomness, but gradient sampling focuses on approximating the Clarke generalized gradient by deterministic means, while martingale methods rely on constructing unbiased error sequences and using probabilistic convergence (see (Boskos et al., 20 Mar 2025) for a discussion and comparison).

7. Implications, Generalizations, and Open Directions

The use of martingale-based unsmooth objective functions in large-scale stochastic optimization, machine learning, and mathematical finance yields both algorithmic flexibility and robust convergence guarantees. The unification of smoothing, proximal, majorize–minimize, and gradient sampling approaches under the martingale probabilistic framework allows the simultaneous management of model uncertainty, non-differentiability, and data-driven stochasticity.

Applications in robust pricing and hedging, adaptive hypothesis testing, streaming regression, and high-dimensional reinforcement learning illustrate the practical importance of these methodologies. Further work continues in extending convergence guarantees under weaker conditions, developing more sophisticated stochastic process techniques for online or nonstationary data, and explicitly characterizing the tradeoffs between computational complexity and statistical efficiency in the context of non-smooth martingale-based optimization.