Papers
Topics
Authors
Recent
Search
2000 character limit reached

Recursive Perturbed Utility (RPU)

Updated 18 March 2026
  • Recursive Perturbed Utility (RPU) is a stochastic differential utility framework that integrates entropy rewards into dynamic portfolio choice to model randomized investment strategies.
  • It generalizes the classical Merton problem by deriving optimal Gaussian portfolio policies with explicit mean and variance expressions that capture both myopic and hedging components.
  • RPU embeds entropy into the discount rate, dynamically reducing the marginal benefit of randomization to ensure the well-posedness of the optimization and mirror observed noisy investment behavior.

Recursive Perturbed Utility (RPU) is a stochastic differential utility framework that models investor preferences for randomization in dynamic portfolio choice, incorporating entropy-based rewards into a recursive aggregator. Developed to address the ill-posedness of additive perturbed utility models in dynamic, continuous-time settings, RPU offers a mathematically tractable and economically interpretable approach to balancing utility from portfolio randomization with traditional objectives such as bequest. RPU generalizes the classical Merton problem by allowing optimal portfolio policies to be inherently stochastic—Gaussian with explicitly characterized mean and variance—thereby unifying rational dynamic investment with empirically observed randomized behavior (Dai et al., 14 Feb 2026).

1. Market Model and Randomized Controls

The RPU framework operates in a standard continuous-time Markovian incomplete market comprising a risk-free asset (rate rr) and a single risky asset. The risky asset price StS_t evolves as

dStSt=μ(t,Xt)dt+σ(t,Xt)dBt\frac{dS_t}{S_t} = \mu(t,X_t)\,dt + \sigma(t,X_t)\,dB_t

where XtX_t is a Markovian state factor following

dXt=m(t,Xt)dt+ν(t,Xt)[ρdBt+1ρ2dB~t]dX_t = m(t,X_t)\,dt + \nu(t,X_t)\bigl[\rho\,dB_t + \sqrt{1-\rho^2}\,d\widetilde{B}_t\bigr]

with independent Brownian motions BtB_t, B~t\widetilde{B}_t and correlation ρ\rho.

At each time tt, the investor's portfolio allocation is not deterministic but is specified by a probability density πt()\pi_t(\cdot) over R\mathbb{R}. The mean and variance of this allocation are denoted

aˉt=Eπ[a],vt=Varπ[a]\bar a_t=\mathbb{E}_\pi[a],\quad v_t=\operatorname{Var}_\pi[a]

The resulting wealth process under this "exploratory" control evolves as

dWtWt=[r+(μ(t,Xt)r)aˉt]dt+σ(t,Xt)[aˉtdBt+vtdBˉt]\frac{dW_t}{W_t} = \left[ r + (\mu(t,X_t)-r)\,\bar a_t \right]\,dt + \sigma(t,X_t) \left[ \bar a_t\, dB_t + \sqrt{v_t}\, d\bar{B}_t \right]

where Bˉt\bar B_t is independent of (Bt,B~t)(B_t,\widetilde{B}_t).

2. Recursive Aggregator and Entropy Reward

Crucial to RPU is the entropy functional of the portfolio density,

H(π)=π(a)lnπ(a)da\mathcal{H}(\pi) = -\int_{-\infty}^\infty \pi(a) \ln \pi(a)\, da

For a temperature function λ(t,x)>0\lambda(t,x)>0, RPU defines a recursive utility process JtJ_t—a backward stochastic differential equation (BSDE)—given by

dJt=λ(t,Xt)H(πt)[(1γ)Jt+1]dt+Ztd(Bt,Bˉt,B~t),JT=U(WT)dJ_t = -\lambda(t,X_t)\,\mathcal{H}(\pi_t)\,[(1-\gamma)J_t + 1]\,dt + Z_t\cdot d(B_t,\bar B_t,\widetilde B_t),\quad J_T = U(W_T)

where U(w)=w1γ11γU(w)=\frac{w^{1-\gamma}-1}{1-\gamma} is CRRA utility.

The recursion can be explicitly expressed as

Jt=E[tTetsλτ(1γ)H(πτ)dτλ(s,Xs)H(πs)ds+etTλτ(1γ)H(πτ)dτU(WT)Ft]J_t = \mathbb{E}\Biggl[ \int_t^T e^{-\int_t^s-\lambda_\tau(1-\gamma)\, \mathcal{H}(\pi_\tau)\,d\tau} \lambda(s,X_s) \mathcal{H}(\pi_s) ds + e^{-\int_t^T-\lambda_\tau(1-\gamma)\, \mathcal{H}(\pi_\tau)\,d\tau} U(W_T) \bigg| \mathcal{F}_t \Biggr]

The entropy reward is thereby discounted endogenously: the marginal utility of additional randomization decreases as cumulative past entropy increases (for γ>1\gamma>1), preventing runaway entropy.

3. Dynamic Programming and Characterization via HJB Equation

The optimal value function under feedback policy π(t,w,x)\pi(t,w,x) is

Vπ(t,w,x)=E[tTetsλ(1γ)H(π)dτλH(πs)ds+etTλ(1γ)H(π)dτU(WT)Wt=w,Xt=x]V^{\pi}(t,w,x) = \mathbb{E} \Bigg[ \int_t^T e^{-\int_t^s-\lambda(1-\gamma)\,\mathcal{H}(\pi)\,d\tau} \lambda\mathcal{H}(\pi_s)ds + e^{-\int_t^T-\lambda(1-\gamma)\,\mathcal{H}(\pi)\,d\tau} U(W_T) \bigg| W_t=w,X_t=x \Bigg]

The associated dynamic programming principle yields a Hamilton–Jacobi–Bellman (HJB) PDE: Vt+supπP(R){[r+(μr)aˉ]wVw+12σ2(aˉ2+v)w2Vww+mVx+12ν2Vxx+ρνσaˉwVxw+λH(π)[(1γ)V+1]}=0V_t + \sup_{\pi\in\mathcal{P}(\mathbb{R})}\Big\{ [r + (\mu - r)\bar a] w V_w + \tfrac{1}{2}\sigma^2(\bar a^2 + v)w^2 V_{ww} + m V_x + \tfrac{1}{2}\nu^2 V_{xx} + \rho\nu\sigma\,\bar a\,w V_{xw} + \lambda \mathcal{H}(\pi)[ (1-\gamma) V+1 ] \Big\} = 0 with V(T,w,x)=U(w)V(T,w,x) = U(w).

4. Structure of the RPU-Optimal Portfolio Policy

Maximizing entropy over densities with given mean and variance yields a Gaussian optimizer, i.e.,

H(π)=12ln(2πev)\mathcal{H}(\pi) = \tfrac{1}{2} \ln(2\pi e v)

Assuming the value function ansatz V(t,w,x)=w1γeu(t,x)11γV(t,w,x) = \frac{w^{1-\gamma} e^{u(t,x)} - 1}{1-\gamma}, the optimal control is

π(t,x)=Normal(mean=a(t,x),variance=σexpl2(t,x))\pi^*(t,x) = \text{Normal}\left(\text{mean}=a^*(t,x),\, \text{variance}=\sigma^2_{\text{expl}}(t,x)\right)

with

Var(π(t,x))=λ(t,x)γσ2(t,x)E[π(t,x)]=μ(t,x)rγσ2(t,x)+ρν(t,x)γσ(t,x)ux(t,x)\operatorname{Var}(\pi^*(t,x)) = \frac{\lambda(t,x)}{\gamma\,\sigma^2(t,x)} \qquad \mathbb{E}[\pi^*(t,x)] = \frac{\mu(t,x)-r}{\gamma\,\sigma^2(t,x)} + \frac{\rho\,\nu(t,x)}{\gamma\,\sigma(t,x)}\,u_x(t,x)

The mean comprises a myopic Merton term and an intertemporal hedging correction, the latter captured by ux(t,x)u_x(t,x), which solves a nonlinear PDE: ut+(1γ)r+mux+12ν2(uxx+ux2)+(1γ)λ2ln2πλγσ2+1γ2γ[(μr)2σ2+2ρμrσνux+ρ2ν2ux2]=0u_t + (1-\gamma)r + m u_x + \frac{1}{2}\nu^2(u_{xx} + u_x^2) + \frac{(1-\gamma)\lambda}{2} \ln\frac{2\pi\lambda}{\gamma\sigma^2} + \frac{1-\gamma}{2\gamma}\Big[\frac{(\mu-r)^2}{\sigma^2} + 2\rho\frac{\mu-r}{\sigma}\nu u_x + \rho^2\nu^2 u_x^2\Big] = 0 with u(T,x)0u(T,x)\equiv0.

5. Asymptotic Expansion and Deviation from Classical Merton Policy

For small entropy weight λ\lambda, the solution admits an asymptotic expansion: u(t,x)=u(0)(t,x)+1γ2(Tt)λlnλ+λu(1)(t,x)+λ2u(2)(t,x)+O(λ3)u(t,x) = u^{(0)}(t,x) + \frac{1-\gamma}{2}(T-t)\lambda\ln\lambda + \lambda u^{(1)}(t,x) + \lambda^2 u^{(2)}(t,x) + O(\lambda^3) where u(0)u^{(0)} solves the classical Merton PDE. The mean policy expands as

a(t,x)=a(0)(t,x)+λρν(t,x)γσ(t,x)ux(1)(t,x)+O(λ2)a^*(t,x) = a^{(0)}(t,x) + \lambda\frac{\rho\nu(t,x)}{\gamma\sigma(t,x)}u^{(1)}_x(t,x) + O(\lambda^2)

The forgone expected utility relative to the non-randomized (λ=0\lambda=0) policy is of order O(λ2)O(\lambda^2). The equivalent relative wealth loss,

Δ(t,x)=λ2ϕ(2)(t,x)1γ+O(λ3)\Delta(t,x) = -\frac{\lambda^2\phi^{(2)}(t,x)}{1-\gamma} + O(\lambda^3)

quantifies the small but nonzero financial cost of enjoying randomization.

6. Economic Interpretation and Well-Posedness

Additive perturbed utility formulations (e.g., λH(π)dt+U(WT)\int \lambda \mathcal{H}(\pi) dt + U(W_T)) may be ill-posed for γ<1\gamma<1 or bounded-below utilities, allowing entropy to diverge. RPU overcomes this by embedding entropy into the discount rate, producing endogenous depreciation of the entropy reward as cumulative randomization accrues. This construction prevents explosion of entropy and ensures the well-posedness of the optimization.

The RPU model parallels Uzawa-style habit formation: the flow "reward" of entropy, tied to local randomization, is dynamically interdependent with the terminal wealth objective. The optimal portfolio’s variance λ/(γσ2)\lambda/(\gamma\sigma^2) decreases with risk aversion γ\gamma and stock volatility σ\sigma, while the mean is distorted from the Merton ratio by an order-λ\lambda entropic hedging adjustment.

A plausible implication is that RPU offers a micro-founded and analytically tractable justification for stochasticity in policy selection, with explicit quantification of the cost of randomization, supporting empirical observations of noisy investment behavior.

7. Connections to Broader Literature and Applications

Recursive Perturbed Utility extends the additive perturbed utility theory of Fudenberg et al. (2015) for static decisions by establishing a well-posed dynamic counterpart. RPU admits explicit portfolio policy characterization in general Markovian incomplete markets with CRRA preferences and allows for closed-form expressions for both mean and variance of optimal policies. It provides a theoretical basis for observed stochastic choice in dynamic portfolio allocation, integrating entropy-regularized decision-making with foundational stochastic control.

Potential applications encompass asset management, behavioral finance, and stochastic control where preference for diversification or exploration is inherent. The RPU formalism is compatible with extensions involving state-dependent randomness preferences and more general stochastic environments (Dai et al., 14 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recursive Perturbed Utility (RPU).