Merton's Problem with Recursive Perturbed Utility

Published 14 Feb 2026 in q-fin.MF and q-fin.PM | (2602.13544v1)

Abstract: The classical Merton investment problem predicts deterministic, state-dependent portfolio rules; however, laboratory and field evidence suggests that individuals often prefer randomized decisions leading to stochastic and noisy choices. Fudenberg et al. (2015) develop the additive perturbed utility theory to explain the preference for randomization in the static setting, which, however, becomes ill-posed or intractable in the dynamic setting. We introduce the recursive perturbed utility (RPU), a special stochastic differential utility that incorporates an entropy-based preference for randomization into a recursive aggregator. RPU endogenizes the intertemporal trade-off between utilities from randomization and bequest via a discounting term dependent on past accumulated randomization, thereby avoiding excessive randomization and yielding a well-posed problem. In a general Markovian incomplete market with CRRA preferences, we prove that the RPU-optimal portfolio policy (in terms of the risk exposure ratio) is Gaussian and can be expressed in closed form, independent of wealth. Its variance is inversely proportional to risk aversion and stock volatility, while its mean is based on the solution to a partial differential equation. Moreover, the mean is the sum of a myopic term and an intertemporal hedging term (against market incompleteness) that intertwines with policy randomization. Finally, we carry out an asymptotic expansion in terms of the perturbed utility weight to show that the optimal mean policy deviates from the classical Merton policy at first order, while the associated relative wealth loss is of a higher order, quantifying the financial cost of the preference for randomization.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces the recursive perturbed utility (RPU) model that integrates entropy-based randomization into dynamic portfolio optimization, addressing limitations of static models.
It mathematically characterizes the optimal randomized policy as a Gaussian distribution with bias conditions dictated by risk aversion and market incompleteness.
An asymptotic expansion quantifies the first-order deviation from classical strategies and measures the modest wealth loss incurred by investors favoring randomization.

Merton's Problem with Recursive Perturbed Utility: Summary and Analysis

Introduction

The paper "Merton's Problem with Recursive Perturbed Utility" (2602.13544) rigorously addresses the question of why a rational investor might prefer stochastic, rather than deterministic, portfolio choices in dynamic settings. Traditional Merton-type portfolio optimization prescribes deterministic, state-dependent rules, but experimental evidence reveals systematic human inclination toward randomization. Stochastic choice models, such as Fudenberg's additive perturbed utility (APU), explain this in static contexts, yet their extension to dynamic setups can be ill-posed or computationally intractable. The authors introduce the recursive perturbed utility (RPU) framework as an entropy-regularized, recursive dynamic utility model—well-posed for a wide class of preferences and mathematically tractable for continuous-time portfolio optimization in Markovian incomplete markets. RPU incorporates endogenous, state- and history-dependent trade-offs between monetary utility and utility from randomization, thereby overcoming theoretical limitations of static perturbations in dynamic environments.

Mathematical Framework

Market Model and Dynamics

The financial market consists of a risk-free asset and a risky asset whose returns and volatility depend on an observable stochastic factor. The dynamics are Markovian and potentially incomplete, with instantaneous return $\mu(t, X_t)$ and volatility $\sigma(t, X_t)$ driven by exogenous factor process $X_t$ (see equations (1) and (2) in the paper).

Investor actions are modeled as scalar-valued processes $a_t$ (fraction of wealth allocated to the risky asset), leading to self-financing wealth dynamics in the classical Merton problem. In the exploratory formulation, actions are generalized to randomized controls $\pi_t$ , probability distributions from which investment decisions are drawn, resulting in altered wealth dynamics capturing increased volatility due to extrinsic randomization.

Entropy-Based Randomization Preference

The RPU framework explicitly rewards the investor for engaging in randomization, measured by the differential entropy $\mathcal{H}(\pi)$ of the randomized control. Unlike static APU which simply adds entropy-based utility to expected terminal wealth, RPU recursively weighs the randomization flow via a dynamic, endogenous discount term, dependent on past and present entropy accumulation. This recursive aggregation mirrors Uzawa-type endogenous time preference and ensures that excessive randomization is discouraged, avoiding ill-posedness even for low risk aversion.

Formally, the value process $J^\pi_t$ satisfies a backward stochastic differential equation incorporating both bequest utility and recursive entropy utility, with the discount rate (aggregator) decreasing as randomization accumulates.

Theoretical Results

Optimal Policy Characterization

Under CRRA preferences, the authors prove that the RPU-optimal portfolio follows a Gaussian distribution, independent of wealth (Theorem 1). The variance is given in closed form, $\operatorname{Var}(\pi^*) = \lambda / (\gamma \sigma^2)$ , and is strictly decreasing in risk aversion $\gamma$ and market volatility $\sigma^2$ . The optimal mean is characterized as the sum of (i) a myopic term, proportional to the instantaneous Sharpe ratio and risk aversion, and (ii) an intertemporal hedging term against market incompleteness, both defined by the solution of a PDE (see eq. (7)–(9)). The hedging term is recursively entangled with randomization: unlike classical Merton solutions or additive entropy regularization, policy randomization affects optimal hedging.

In complete markets or for log-utility ( $\gamma = 1$ ), the hedging term vanishes, and the mean reduces to the classical, unbiased Merton rule; recursive and additive entropy models coincide. However, when risk aversion is not unitary or markets are incomplete, recursive regularization induces bias in the optimal policy mean.

Asymptotic Expansion and Financial Cost of Randomization

The paper provides an asymptotic expansion in the temperature parameter $\lambda$ , quantifying how the optimal mean deviates from its classical value and measuring the associated wealth loss. Explicitly, the deviation is first-order in $\lambda$ , while the relative wealth loss is higher-order ( $O(\lambda^2)$ ), establishing that the cost of preferential randomization is small and measurable for moderate $\lambda$ (Theorem 2). The equivalent wealth loss that investors are willing to pay for the pleasure of randomization is formalized, with explicit PDEs for the expansion coefficients.

Relation to Entropy Regularization and RL

While entropy-regularized RL frameworks incentivize exploration due to model uncertainty, the RPU approach here formalizes intrinsic human preference for randomization under full information. Mathematically, both lead to relaxed control models with Gibbs optimal measures, but motivations and utility aggregation differ fundamentally. The paper surveys the literature on continuous-time entropy regularization (e.g., [wang2018exploration], [bender2024continuous], [jia2025accuracy]), noting that existing works employ additive perturbations, whereas RPU generalizes to recursive structures.

Unbiasedness and Bias Conditions

The bias in the optimal randomized policy is shown to depend entirely on the factor dynamics and their stochastic coupling to the asset price. When market incompleteness is absent (deterministic factors or independent factor-stock dynamics), or log-utility is used, the recursive solution is unbiased and coincides with classical benchmarks. The BSDE perspective further elucidates conditions leading to unbiasedness.

Extension and Limitations

Alternative temperature weighting schemes (constant, wealth-dependent) are discussed and shown to be theoretically problematic (ill-posed or non-tractable) compared to recursive weighting. The RPU approach is also extended to CARA preferences, demonstrating generalizability.

Implications and Future Directions

The recursive perturbed utility framework establishes a rigorous foundation for modeling dynamic stochastic choice in portfolio optimization. The practical implication is that optimal policy randomization can be formally justified and quantitatively analyzed, moving beyond stylized empiricism to precise measurement of the financial cost of randomization preference. For asset management and behavioral finance, incorporating recursive entropy-based utility may improve descriptive and predictive accuracy, potentially illuminating asset pricing puzzles tied to choice randomization.

From a theoretical standpoint, the recursive aggregation offers flexible trade-offs suitable for other dynamic stochastic control problems (e.g., consumption/investment, gambling, mean-field games) and may lead to new formulations for RL with intrinsic exploration motivations.

Future research could enhance the framework by integrating consumption, empirically calibrating randomization preference in real markets, and disentangling intrinsic versus extrinsic motivations in RL settings. The impact of alternative entropy functionals (Tsallis, Renyi) and utility forms (non-CRRA/CARA) remains unexplored.

Conclusion

The paper provides a mathematically rigorous and economically interpretable approach to dynamic portfolio optimization with stochastic choice, grounded in recursive entropy-perturbed utility. The optimal policies are tractable (Gaussian), and the bias induced by randomization preference is precisely characterized. Recursive aggregation resolves the ill-posedness of static additive regularization, quantifies the wealth cost of randomization, and opens new avenues for modeling behavioral preferences in stochastic control and reinforcement learning.

Markdown Report Issue