Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Return-Aligned Decision Transformer (2402.03923v4)

Published 6 Feb 2024 in cs.LG

Abstract: Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. However, as applications broaden, it becomes increasingly crucial to train agents that not only maximize the returns, but align the actual return with a specified target return, giving control over the agent's performance. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and is equipped with a mechanism to control the agent using the target return. However, the action generation is hardly influenced by the target return because DT's self-attention allocates scarce attention scores to the return tokens. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to effectively align the actual return with the target return. RADT utilizes features extracted by paying attention solely to the return, enabling the action generation to consistently depend on the target return. Extensive experiments show that RADT reduces the discrepancies between the actual return and the target return of DT-based methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tsunehiko Tanaka (5 papers)
  2. Kenshi Abe (32 papers)
  3. Kaito Ariu (32 papers)
  4. Tetsuro Morimura (18 papers)
  5. Edgar Simo-Serra (25 papers)

Summary

We haven't generated a summary for this paper yet.