Research on Short-Video Platform User Decision-Making via Multimodal Temporal Modeling and Reinforcement Learning (2509.12269v1)
Abstract: This paper proposes the MT-DQN model, which integrates a Transformer, Temporal Graph Neural Network (TGNN), and Deep Q-Network (DQN) to address the challenges of predicting user behavior and optimizing recommendation strategies in short-video environments. Experiments demonstrated that MT-DQN consistently outperforms traditional concatenated models, such as Concat-Modal, achieving an average F1-score improvement of 10.97% and an average NDCG@5 improvement of 8.3%. Compared to the classic reinforcement learning model Vanilla-DQN, MT-DQN reduces MSE by 34.8% and MAE by 26.5%. Nonetheless, we also recognize challenges in deploying MT-DQN in real-world scenarios, such as its computational cost and latency sensitivity during online inference, which will be addressed through future architectural optimization.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.