Recursive Perturbed Utility (RPU)
- Recursive Perturbed Utility (RPU) is a stochastic differential utility framework that integrates entropy rewards into dynamic portfolio choice to model randomized investment strategies.
- It generalizes the classical Merton problem by deriving optimal Gaussian portfolio policies with explicit mean and variance expressions that capture both myopic and hedging components.
- RPU embeds entropy into the discount rate, dynamically reducing the marginal benefit of randomization to ensure the well-posedness of the optimization and mirror observed noisy investment behavior.
Recursive Perturbed Utility (RPU) is a stochastic differential utility framework that models investor preferences for randomization in dynamic portfolio choice, incorporating entropy-based rewards into a recursive aggregator. Developed to address the ill-posedness of additive perturbed utility models in dynamic, continuous-time settings, RPU offers a mathematically tractable and economically interpretable approach to balancing utility from portfolio randomization with traditional objectives such as bequest. RPU generalizes the classical Merton problem by allowing optimal portfolio policies to be inherently stochastic—Gaussian with explicitly characterized mean and variance—thereby unifying rational dynamic investment with empirically observed randomized behavior (Dai et al., 14 Feb 2026).
1. Market Model and Randomized Controls
The RPU framework operates in a standard continuous-time Markovian incomplete market comprising a risk-free asset (rate ) and a single risky asset. The risky asset price evolves as
where is a Markovian state factor following
with independent Brownian motions , and correlation .
At each time , the investor's portfolio allocation is not deterministic but is specified by a probability density over . The mean and variance of this allocation are denoted
The resulting wealth process under this "exploratory" control evolves as
where is independent of .
2. Recursive Aggregator and Entropy Reward
Crucial to RPU is the entropy functional of the portfolio density,
For a temperature function , RPU defines a recursive utility process —a backward stochastic differential equation (BSDE)—given by
where is CRRA utility.
The recursion can be explicitly expressed as
The entropy reward is thereby discounted endogenously: the marginal utility of additional randomization decreases as cumulative past entropy increases (for ), preventing runaway entropy.
3. Dynamic Programming and Characterization via HJB Equation
The optimal value function under feedback policy is
The associated dynamic programming principle yields a Hamilton–Jacobi–Bellman (HJB) PDE: with .
4. Structure of the RPU-Optimal Portfolio Policy
Maximizing entropy over densities with given mean and variance yields a Gaussian optimizer, i.e.,
Assuming the value function ansatz , the optimal control is
with
The mean comprises a myopic Merton term and an intertemporal hedging correction, the latter captured by , which solves a nonlinear PDE: with .
5. Asymptotic Expansion and Deviation from Classical Merton Policy
For small entropy weight , the solution admits an asymptotic expansion: where solves the classical Merton PDE. The mean policy expands as
The forgone expected utility relative to the non-randomized () policy is of order . The equivalent relative wealth loss,
quantifies the small but nonzero financial cost of enjoying randomization.
6. Economic Interpretation and Well-Posedness
Additive perturbed utility formulations (e.g., ) may be ill-posed for or bounded-below utilities, allowing entropy to diverge. RPU overcomes this by embedding entropy into the discount rate, producing endogenous depreciation of the entropy reward as cumulative randomization accrues. This construction prevents explosion of entropy and ensures the well-posedness of the optimization.
The RPU model parallels Uzawa-style habit formation: the flow "reward" of entropy, tied to local randomization, is dynamically interdependent with the terminal wealth objective. The optimal portfolio’s variance decreases with risk aversion and stock volatility , while the mean is distorted from the Merton ratio by an order- entropic hedging adjustment.
A plausible implication is that RPU offers a micro-founded and analytically tractable justification for stochasticity in policy selection, with explicit quantification of the cost of randomization, supporting empirical observations of noisy investment behavior.
7. Connections to Broader Literature and Applications
Recursive Perturbed Utility extends the additive perturbed utility theory of Fudenberg et al. (2015) for static decisions by establishing a well-posed dynamic counterpart. RPU admits explicit portfolio policy characterization in general Markovian incomplete markets with CRRA preferences and allows for closed-form expressions for both mean and variance of optimal policies. It provides a theoretical basis for observed stochastic choice in dynamic portfolio allocation, integrating entropy-regularized decision-making with foundational stochastic control.
Potential applications encompass asset management, behavioral finance, and stochastic control where preference for diversification or exploration is inherent. The RPU formalism is compatible with extensions involving state-dependent randomness preferences and more general stochastic environments (Dai et al., 14 Feb 2026).