2000 character limit reached
On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration (2501.12785v1)
Published 22 Jan 2025 in stat.ML and cs.LG
Abstract: This paper tackles the efficiency and stability issues in learning from observations (LfO). We commence by investigating how reward functions and policies generalize in LfO. Subsequently, the built-in reinforcement learning (RL) approach in generative adversarial imitation from observation (GAIfO) is replaced with distributional soft actor-critic (DSAC). This change results in a novel algorithm called Mimicking Observations through Distributional Update Learning with adequate Exploration (MODULE), which combines soft actor-critic's superior efficiency with distributional RL's robust stability.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.