Recursive Perturbed Utility (RPU)

Updated 18 March 2026

Recursive Perturbed Utility (RPU) is a stochastic differential utility framework that integrates entropy rewards into dynamic portfolio choice to model randomized investment strategies.
It generalizes the classical Merton problem by deriving optimal Gaussian portfolio policies with explicit mean and variance expressions that capture both myopic and hedging components.
RPU embeds entropy into the discount rate, dynamically reducing the marginal benefit of randomization to ensure the well-posedness of the optimization and mirror observed noisy investment behavior.

Recursive Perturbed Utility (RPU) is a stochastic differential utility framework that models investor preferences for randomization in dynamic portfolio choice, incorporating entropy-based rewards into a recursive aggregator. Developed to address the ill-posedness of additive perturbed utility models in dynamic, continuous-time settings, RPU offers a mathematically tractable and economically interpretable approach to balancing utility from portfolio randomization with traditional objectives such as bequest. RPU generalizes the classical Merton problem by allowing optimal portfolio policies to be inherently stochastic—Gaussian with explicitly characterized mean and variance—thereby unifying rational dynamic investment with empirically observed randomized behavior (Dai et al., 14 Feb 2026).

1. Market Model and Randomized Controls

The RPU framework operates in a standard continuous-time Markovian incomplete market comprising a risk-free asset (rate $r$ ) and a single risky asset. The risky asset price $S_t$ evolves as

$\frac{dS_t}{S_t} = \mu(t,X_t)\,dt + \sigma(t,X_t)\,dB_t$

where $X_t$ is a Markovian state factor following

$dX_t = m(t,X_t)\,dt + \nu(t,X_t)\bigl[\rho\,dB_t + \sqrt{1-\rho^2}\,d\widetilde{B}_t\bigr]$

with independent Brownian motions $B_t$ , $\widetilde{B}_t$ and correlation $\rho$ .

At each time $t$ , the investor's portfolio allocation is not deterministic but is specified by a probability density $\pi_t(\cdot)$ over $\mathbb{R}$ . The mean and variance of this allocation are denoted

$\bar a_t=\mathbb{E}_\pi[a],\quad v_t=\operatorname{Var}_\pi[a]$

The resulting wealth process under this "exploratory" control evolves as

$\frac{dW_t}{W_t} = \left[ r + (\mu(t,X_t)-r)\,\bar a_t \right]\,dt + \sigma(t,X_t) \left[ \bar a_t\, dB_t + \sqrt{v_t}\, d\bar{B}_t \right]$

where $\bar B_t$ is independent of $(B_t,\widetilde{B}_t)$ .

2. Recursive Aggregator and Entropy Reward

Crucial to RPU is the entropy functional of the portfolio density,

$\mathcal{H}(\pi) = -\int_{-\infty}^\infty \pi(a) \ln \pi(a)\, da$

For a temperature function $\lambda(t,x)>0$ , RPU defines a recursive utility process $J_t$ —a backward stochastic differential equation (BSDE)—given by

$dJ_t = -\lambda(t,X_t)\,\mathcal{H}(\pi_t)\,[(1-\gamma)J_t + 1]\,dt + Z_t\cdot d(B_t,\bar B_t,\widetilde B_t),\quad J_T = U(W_T)$

where $U(w)=\frac{w^{1-\gamma}-1}{1-\gamma}$ is CRRA utility.

The recursion can be explicitly expressed as

$J_t = \mathbb{E}\Biggl[ \int_t^T e^{-\int_t^s-\lambda_\tau(1-\gamma)\, \mathcal{H}(\pi_\tau)\,d\tau} \lambda(s,X_s) \mathcal{H}(\pi_s) ds + e^{-\int_t^T-\lambda_\tau(1-\gamma)\, \mathcal{H}(\pi_\tau)\,d\tau} U(W_T) \bigg| \mathcal{F}_t \Biggr]$

The entropy reward is thereby discounted endogenously: the marginal utility of additional randomization decreases as cumulative past entropy increases (for $\gamma>1$ ), preventing runaway entropy.

3. Dynamic Programming and Characterization via HJB Equation

The optimal value function under feedback policy $\pi(t,w,x)$ is

$V^{\pi}(t,w,x) = \mathbb{E} \Bigg[ \int_t^T e^{-\int_t^s-\lambda(1-\gamma)\,\mathcal{H}(\pi)\,d\tau} \lambda\mathcal{H}(\pi_s)ds + e^{-\int_t^T-\lambda(1-\gamma)\,\mathcal{H}(\pi)\,d\tau} U(W_T) \bigg| W_t=w,X_t=x \Bigg]$

The associated dynamic programming principle yields a Hamilton–Jacobi–Bellman (HJB) PDE: $V_t + \sup_{\pi\in\mathcal{P}(\mathbb{R})}\Big\{ [r + (\mu - r)\bar a] w V_w + \tfrac{1}{2}\sigma^2(\bar a^2 + v)w^2 V_{ww} + m V_x + \tfrac{1}{2}\nu^2 V_{xx} + \rho\nu\sigma\,\bar a\,w V_{xw} + \lambda \mathcal{H}(\pi)[ (1-\gamma) V+1 ] \Big\} = 0$ with $V(T,w,x) = U(w)$ .

4. Structure of the RPU-Optimal Portfolio Policy

Maximizing entropy over densities with given mean and variance yields a Gaussian optimizer, i.e.,

$\mathcal{H}(\pi) = \tfrac{1}{2} \ln(2\pi e v)$

Assuming the value function ansatz $V(t,w,x) = \frac{w^{1-\gamma} e^{u(t,x)} - 1}{1-\gamma}$ , the optimal control is

$\pi^*(t,x) = \text{Normal}\left(\text{mean}=a^*(t,x),\, \text{variance}=\sigma^2_{\text{expl}}(t,x)\right)$

with

$\operatorname{Var}(\pi^*(t,x)) = \frac{\lambda(t,x)}{\gamma\,\sigma^2(t,x)} \qquad \mathbb{E}[\pi^*(t,x)] = \frac{\mu(t,x)-r}{\gamma\,\sigma^2(t,x)} + \frac{\rho\,\nu(t,x)}{\gamma\,\sigma(t,x)}\,u_x(t,x)$

The mean comprises a myopic Merton term and an intertemporal hedging correction, the latter captured by $u_x(t,x)$ , which solves a nonlinear PDE: $u_t + (1-\gamma)r + m u_x + \frac{1}{2}\nu^2(u_{xx} + u_x^2) + \frac{(1-\gamma)\lambda}{2} \ln\frac{2\pi\lambda}{\gamma\sigma^2} + \frac{1-\gamma}{2\gamma}\Big[\frac{(\mu-r)^2}{\sigma^2} + 2\rho\frac{\mu-r}{\sigma}\nu u_x + \rho^2\nu^2 u_x^2\Big] = 0$ with $u(T,x)\equiv0$ .

5. Asymptotic Expansion and Deviation from Classical Merton Policy

For small entropy weight $\lambda$ , the solution admits an asymptotic expansion: $u(t,x) = u^{(0)}(t,x) + \frac{1-\gamma}{2}(T-t)\lambda\ln\lambda + \lambda u^{(1)}(t,x) + \lambda^2 u^{(2)}(t,x) + O(\lambda^3)$ where $u^{(0)}$ solves the classical Merton PDE. The mean policy expands as

$a^*(t,x) = a^{(0)}(t,x) + \lambda\frac{\rho\nu(t,x)}{\gamma\sigma(t,x)}u^{(1)}_x(t,x) + O(\lambda^2)$

The forgone expected utility relative to the non-randomized ( $\lambda=0$ ) policy is of order $O(\lambda^2)$ . The equivalent relative wealth loss,

$\Delta(t,x) = -\frac{\lambda^2\phi^{(2)}(t,x)}{1-\gamma} + O(\lambda^3)$

quantifies the small but nonzero financial cost of enjoying randomization.

6. Economic Interpretation and Well-Posedness

Additive perturbed utility formulations (e.g., $\int \lambda \mathcal{H}(\pi) dt + U(W_T)$ ) may be ill-posed for $\gamma<1$ or bounded-below utilities, allowing entropy to diverge. RPU overcomes this by embedding entropy into the discount rate, producing endogenous depreciation of the entropy reward as cumulative randomization accrues. This construction prevents explosion of entropy and ensures the well-posedness of the optimization.

The RPU model parallels Uzawa-style habit formation: the flow "reward" of entropy, tied to local randomization, is dynamically interdependent with the terminal wealth objective. The optimal portfolio’s variance $\lambda/(\gamma\sigma^2)$ decreases with risk aversion $\gamma$ and stock volatility $\sigma$ , while the mean is distorted from the Merton ratio by an order- $\lambda$ entropic hedging adjustment.

A plausible implication is that RPU offers a micro-founded and analytically tractable justification for stochasticity in policy selection, with explicit quantification of the cost of randomization, supporting empirical observations of noisy investment behavior.

7. Connections to Broader Literature and Applications

Recursive Perturbed Utility extends the additive perturbed utility theory of Fudenberg et al. (2015) for static decisions by establishing a well-posed dynamic counterpart. RPU admits explicit portfolio policy characterization in general Markovian incomplete markets with CRRA preferences and allows for closed-form expressions for both mean and variance of optimal policies. It provides a theoretical basis for observed stochastic choice in dynamic portfolio allocation, integrating entropy-regularized decision-making with foundational stochastic control.

Potential applications encompass asset management, behavioral finance, and stochastic control where preference for diversification or exploration is inherent. The RPU formalism is compatible with extensions involving state-dependent randomness preferences and more general stochastic environments (Dai et al., 14 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Merton's Problem with Recursive Perturbed Utility (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recursive Perturbed Utility (RPU).