Stopping-Time Reward Analysis

Updated 13 December 2025

Stopping-Time Reward is a fundamental concept in optimal stopping theory, defining the payoff from choosing a random stopping time in a stochastic process.
It employs methods like Dynkin’s characterization and threshold strategies to optimize expected rewards over various stochastic models.
The framework is applied in finance, sequential testing, and dynamic programming to solve complex decision-making problems.

A stopping-time reward is a fundamental concept in the theory of optimal stopping, representing the payoff accrued by implementing a random decision time (stopping time) in a stochastic process. The reward is typically defined as a function of the process evaluated at the chosen stopping time, possibly including both terminal and accumulated (running) components. The central problem is to determine the stopping time that maximizes (or minimizes) the expected value of this reward, subject to the filtration generated by the process and the admissibility of the stopping strategy.

1. Classical Formulation and Reward Structures

Let $X = \{X_t\}_{t \ge 0}$ be a strong Markov process defined on a filtered probability space $(\Omega, \mathcal F, \mathbb F, \mathbb P)$ . Given a Borel-measurable reward function $g: E \to \mathbb{R}$ and a discount rate $r \ge 0$ , the canonical stopping-time reward functional is

$V(x) = \sup_{\tau} \mathbb{E}_x \left[ e^{-r\tau} g(X_\tau) \right],$

where the supremum is over all $\mathbb F$ -stopping times $\tau$ .

The reward can include additional features:

A running reward:

$J(x, \tau) = \mathbb{E}_x \left[ \int_0^{\tau} e^{-r s} f(X_s) ds + e^{-r\tau}g(X_\tau) \right],$

for a running profit rate $f$ .

Nonlinear or risk-sensitive reward transformations, e.g., maximizing the certainty equivalent under a utility function $U$ :

$\sup_{\tau} U^{-1}\left( \mathbb{E}_x[U(g(X_\tau))] \right).$

This formalism encompasses discrete-time random walks, continuous-time diffusions, Lévy processes, Markov chains, and Markov decision processes, as well as partially observed and nonlinear expectation frameworks (Crocce, 2014, Kobylanski et al., 2010, Bäuerle et al., 2017).

2. Dynkin's Characterization and Riesz Representation

Dynkin's theorem gives a variational description: the value function $V$ is the minimal $r$ -excessive majorant of the reward $g$ . A function $u$ is $r$ -excessive if

$e^{-rt} \mathbb{E}_x[u(X_t)] \le u(x), \quad \forall t\ge0,$

and $\lim_{t\downarrow 0}\mathbb{E}_x[e^{-rt}u(X_t)] = u(x)$ .

If the process admits a Green kernel $G_r(x, dy)$ , any $r$ -excessive function has a unique Riesz representation

$u(x) = \int_E G_r(x, dy)\,\sigma(dy) + h(x),$

where $\sigma$ is a nonnegative measure supported on the stopping region and $h$ is $r$ -harmonic. In many cases of optimal stopping, $h \equiv 0$ , and the representing measure $\sigma$ encodes the distribution on the stopping set (Crocce, 2014).

In one-dimensional diffusions, the explicit kernel choices and scale/speed measure structure allow analytic or algorithmic construction of $V$ and the stopping region (Crocce et al., 2019).

3. Principle of Threshold Strategies and One-sidedness

For a wide class of reward functions, especially those that are continuous, nondecreasing, log-concave, and right-continuous, the optimal stopping region is a threshold set ("one-sided solution") (Lin et al., 2017). The value function admits the explicit form

$V(x) = \begin{cases} \mathbb{E}_x[e^{-q \tau_{u}} g(X_{\tau_{u}})], & x < u, \ g(x), & x \ge u, \end{cases}$

where the optimal threshold $u$ is characterized as the solution of $R(u) := \mathbb{E}[e^{-q \tau_{u}} g(X_{\tau_{u}})] / g(u) = 1$ and $\tau_u = \inf\{ t \ge 0 : X_t > u \}$ .

This reduction to threshold rules extends to broad families of Lévy processes and random walks, provided mild integrability and regularity conditions (Lin et al., 2017). For complex or multi-modal payoffs, the continuation region can fragment into unions of intervals, each characterized via flux and balance conditions associated with the Green kernel (Crocce, 2014).

4. Smooth Fit Principle and Regularity Issues

The principle of smooth fit states that, under sufficient regularity, the value function $V$ matches the reward $g$ not only in value but also in derivative at the optimal boundary. Explicitly, for sufficiently regular diffusions and reward functions,

$V(u^-) = g(u), \quad V'(u^-) = g'(u).$

This identity may fail if the process does not creep or the reward lacks differentiability. For Lévy processes, the validity of smooth fit links to the regularity of the boundary and the nature of jumps. If the process cannot creep, the fit is generally lost (Lin et al., 2017).

In cases with non-smooth payoffs or Markov processes with discrete/atomic features in speed measure, verification theorems and extended inversion formulas are used to construct or justify $V$ , even when classical smooth-fit is invalid (Crocce, 2014, Crocce et al., 2019).

5. Algorithmic Characterization and Multi-Interval Structures

For one-dimensional diffusions, explicit algorithms identify stopping regions as unions of intervals via iterative enlargement and merging procedures. Support sets where the generator applied to the payoff is negative are expanded until integrals involving fundamental solutions reach zero or prescribed boundary conditions. This approach systematically delivers the continuation and stopping regions, even in the presence of complex reward functions such as piecewise polynomials or processes with atoms in their speed measure (Crocce et al., 2019, Crocce, 2014).

For Markov processes in higher dimensions or with jumps, analogous verification methods rely on constructing candidate excessive functions and checking required inequalities and boundary conditions.

6. Extensions: Nonlinear, Multiple, and Time-Inconsistent Rewards

Nonlinear and risk-sensitive rewards: With utility-based or certainty equivalent transformations (e.g., maximizing $U^{-1}\!\mathbb{E}[U(R)]$ ), dynamic programming recursions generalize to Bellman equations with nonlinear value updates, and optimality may necessitate stopping at times that are not jump epochs (Bäuerle et al., 2017, Bäuerle et al., 2016).
Multiple stopping: This generalizes the reward to a function of several stopping times, and reduction to a single stopping problem is achieved via construction of an aggregated or "new" reward function, whose Snell envelope governs optimality (Kobylanski et al., 2009, Li, 2019).
Time-inconsistent reward structures: When the stopping reward depends on the initial state or includes mean-field/state-averaged functionals, equilibrium—rather than optimal—strategies must be invoked. Iterative methods and dynamic programming equations for the auxiliary value functions characterize the corresponding equilibrium stopping times (Christensen et al., 2017, Djehiche et al., 2022).

7. Examples and Applications

Illustrative cases include:

Skew and sticky Brownian motion, demonstrating points of failure or modification of classical smooth fit.
Power and call-type payoffs in Lévy and diffusion models, where explicit closed-form solutions for the value and threshold can be derived (Lin et al., 2017).
Lévy-driven Ornstein-Uhlenbeck processes, offering the first analytic solutions to optimal stopping for non-Lévy jump models by explicit inversion with the Green function (Crocce, 2014).

These methodologies underpin a range of applied problems, from American and exotic option pricing in incomplete or Lévy markets to sequential hypothesis testing and dynamic order selection in adaptive search problems (Crocce, 2014, Lin et al., 2017, Agrawal et al., 2019).

For in-depth results, explicit algorithms, and rigorous verification theorems, see (Crocce, 2014, Crocce et al., 2019), and (Lin et al., 2017).