Papers
Topics
Authors
Recent
Search
2000 character limit reached

Continuous-time q-learning for Markov regime switching system under Tsallis entropy

Published 27 Jan 2026 in math.OC | (2601.19299v1)

Abstract: This paper studies the continuous-time q-learning (the continuous time counterpart of Q-learing) for Markov switching system under Tsallis entropy regularization. We address the difficulty in traditional RL algorithms where the Tsallis entropy regularization leads to an optimal policy distribution not necessarily a Gibbs measure, which often complicates algorithm design. Furthermore, to address the limited universality of current continuous time regime-switching RL algorithms (often restricted to the EMV framework), this study focuses on continuous-time q-learning for Markov regime-switching systems based on Tsallis entropy, aiming for a more universally applicable continuous-time RL method. We establish the martingale characterization of the q-function under Tsallis entropy for continuous-time Markov regime-switching systems. Based on this, we design two q-learning algorithms, distinguished by whether the Lagrange multiplier can be explicitly derived. We apply these algorithms to the continuous-time exploratory Mean-Variance (EMV) portfolio optimization problem in a regime-switching market. Numerical experiments demonstrate the satisfactory performance of our q-learning algorithms.

Authors (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.