Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy (2407.03888v2)

Published 4 Jul 2024 in math.OC and cs.LG

Abstract: This paper studies the continuous-time reinforcement learning in jump-diffusion models by featuring the q-learning (the continuous-time counterpart of Q-learning) under Tsallis entropy regularization. Contrary to the Shannon entropy, the general form of Tsallis entropy renders the optimal policy not necessary a Gibbs measure, where the Lagrange and KKT multipliers naturally arise from some constraints to ensure the learnt policy to be a probability density function. As a consequence, the characterization of the optimal policy using the q-function also involves a Lagrange multiplier. In response, we establish the martingale characterization of the q-function under Tsallis entropy and devise two q-learning algorithms depending on whether the Lagrange multiplier can be derived explicitly or not. In the latter case, we need to consider different parameterizations of the optimal q-function and the optimal policy and update them alternatively in an Actor-Critic manner. We also study two financial applications, namely, an optimal portfolio liquidation problem and a non-LQ control problem. It is interesting to see therein that the optimal policies under the Tsallis entropy regularization can be characterized explicitly, which are distributions concentrated on some compact support. The satisfactory performance of our q-learning algorithms is illustrated in each example.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lijun Bo (33 papers)
  2. Yijie Huang (14 papers)
  3. Xiang Yu (130 papers)
  4. Tingting Zhang (53 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.