Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Context-Former: Stitching via Latent Conditioned Sequence Modeling (2401.16452v3)

Published 29 Jan 2024 in cs.LG and cs.AI

Abstract: Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones. Meanwhile, Decision Transformer (DT) abstracts the RL as sequence modeling, showcasing competitive performance on offline RL benchmarks. However, recent studies demonstrate that DT lacks of stitching capacity, thus exploiting stitching capability for DT is vital to further improve its performance. In order to endow stitching capability to DT, we abstract trajectory stitching as expert matching and introduce our approach, ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectory fragments by emulating the representations of a limited number of expert trajectories. To validate our approach, we conduct experiments from two perspectives: 1) We conduct extensive experiments on D4RL benchmarks under the settings of IL, and experimental results demonstrate ContextFormer can achieve competitive performance in multiple IL settings. 2) More importantly, we conduct a comparison of ContextFormer with various competitive DT variants using identical training datasets. The experimental results unveiled ContextFormer's superiority, as it outperformed all other variants, showcasing its remarkable performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Deep reinforcement learning at the edge of the statistical precipice, 2022.
  2. Uncertainty-based offline reinforcement learning with diversified q-ensemble, 2021.
  3. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5):469–483, 2009. ISSN 0921-8890. doi: https://doi.org/10.1016/j.robot.2008.10.024. URL https://www.sciencedirect.com/science/article/pii/S0921889008001772.
  4. Imitation learning by state-only distribution matching, 2022.
  5. Disagreement-regularized imitation learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rkgbYyHtwB.
  6. Openai gym, 2016.
  7. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Kaelbling, L. P., Kragic, D., and Sugiura, K. (eds.), Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, pp.  330–359. PMLR, 30 Oct–01 Nov 2020. URL https://proceedings.mlr.press/v100/brown20a.html.
  8. Mitigating covariate shift in imitation learning via offline data with partial coverage. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  965–979. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/07d5938693cc3903b261e1a3844590ed-Paper.pdf.
  9. Decision transformer: Reinforcement learning via sequence modeling, 2021.
  10. Latent-variable advantage-weighted policy optimization for offline rl, 2022.
  11. Ditto: Offline imitation learning with world models, 2023.
  12. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  13. Diagnosing bottlenecks in deep q-learning algorithms, 2019.
  14. D4rl: Datasets for deep data-driven reinforcement learning. ArXiv, abs/2004.07219, 2020. URL https://api.semanticscholar.org/CorpusID:215827910.
  15. D4rl: Datasets for deep data-driven reinforcement learning, 2021.
  16. Generalized decision transformer for offline hindsight information matching, 2022.
  17. Iq-learn: Inverse soft-q learning for imitation, 2022.
  18. Pre-training to learn in context, 2023.
  19. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
  20. Generative adversarial imitation learning. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf.
  21. Offline rl with observation histories: Analyzing and improving sample complexity, 2023.
  22. Prompt-tuning decision transformer with preference ranking, 2023.
  23. Imitation learning with demonstrations and shaping rewards. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1), Jun. 2014. doi: 10.1609/aaai.v28i1.9024. URL https://ojs.aaai.org/index.php/AAAI/article/view/9024.
  24. Beyond reward: Offline preference-guided policy optimization, 2023.
  25. DemoDICE: Offline imitation learning with supplementary imperfect demonstrations. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=BrPdX1bDZkQ.
  26. Imitation learning via off-policy distribution matching, 2019.
  27. Offline reinforcement learning with implicit q-learning, 2021.
  28. Conservative q-learning for offline reinforcement learning, 2020.
  29. Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020.
  30. Ceil: Generalized contextual imitation learning, 2023a.
  31. Clue: Calibrated latent guidance for offline reinforcement learning, 2023b.
  32. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, 2021.
  33. Imitation from observation: Learning to imitate behaviors from raw video via context translation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp.  1118–1125, 2018. doi: 10.1109/ICRA.2018.8462901.
  34. Versatile offline imitation from observations and examples via regularized state-occupancy matching, 2022.
  35. You can’t count on luck: Why decision transformers and rvs fail in stochastic environments. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  38966–38979. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/fe90657b12193c7b52a3418bdc351807-Paper-Conference.pdf.
  36. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning, 2019.
  37. Recent advances in robot learning from demonstration. Annual Review of Control, Robotics, and Autonomous Systems, 3(1):297–330, 2020. doi: 10.1146/annurev-control-100819-063206. URL https://doi.org/10.1146/annurev-control-100819-063206.
  38. Sqil: Imitation learning via reinforcement learning with sparse rewards, 2019.
  39. A reduction of imitation learning and structured prediction to no-regret online learning. In Gordon, G., Dunson, D., and Dudík, M. (eds.), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pp.  627–635, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR. URL https://proceedings.mlr.press/v15/ross11a.html.
  40. Behavioral cloning from noisy demonstrations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=zrT3HcsWSAt.
  41. Proximal policy optimization algorithms, 2017.
  42. Recent advances in imitation learning from observation, 2019.
  43. Scalar reward is not enough: A response to silver, singh, precup and sutton (2021), 2021.
  44. Supported policy optimization for offline reinforcement learning, 2022.
  45. Prompting decision transformer for few-shot policy generalization, 2022.
  46. Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl, 2023.
  47. Discriminator-guided model-based offline imitation learning. In Liu, K., Kulic, D., and Ichnowski, J. (eds.), Proceedings of The 6th Conference on Robot Learning, volume 205 of Proceedings of Machine Learning Research, pp.  1266–1276. PMLR, 14–18 Dec 2023. URL https://proceedings.mlr.press/v205/zhang23c.html.
  48. Offline learning from demonstrations and unlabeled experience, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ziqi Zhang (64 papers)
  2. Jingzehua Xu (15 papers)
  3. Zifeng Zhuang (19 papers)
  4. Jinxin Liu (49 papers)
  5. Donglin Wang (103 papers)
  6. Miao Liu (98 papers)
  7. Shuai Zhang (319 papers)
Citations (1)