Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 88 tok/s

Gemini 2.5 Pro 59 tok/s Pro

GPT-5 Medium 31 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 110 tok/s Pro

Kimi K2 210 tok/s Pro

GPT OSS 120B 461 tok/s Pro

Claude Sonnet 4.5 38 tok/s Pro

2000 character limit reached

Offline Inverse Reinforcement Learning (2106.05068v1)

Published 9 Jun 2021 in cs.LG

Abstract: The objective of offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available and sampling additional observations is impossible (typically if this operation is either costly or rises ethical questions). In order to solve this problem, off the shelf approaches require a properly defined cost function (or its evaluation on the provided data-set), which are seldom available in practice. To circumvent this issue, a reasonable alternative is to query an expert for few optimal demonstrations in addition to the exploratory data-set. The objective is then to learn an optimal policy w.r.t. the expert's latent cost function. Current solutions either solve a behaviour cloning problem (which does not leverage the exploratory data) or a reinforced imitation learning problem (using a fixed cost function that discriminates available exploratory trajectories from expert ones). Inspired by the success of IRL techniques in achieving state of the art imitation performances in online settings, we exploit GAN based data augmentation procedures to construct the first offline IRL algorithm. The obtained policies outperformed the aforementioned solutions on multiple OpenAI gym environments.

Citations (13)

View on Semantic Scholar