Deconfounding Reinforcement Learning in Observational Settings (1812.10576v1)

Published 26 Dec 2018 in cs.LG and stat.ML

Abstract: We propose a general formulation for addressing reinforcement learning (RL) problems in settings with observational data. That is, we consider the problem of learning good policies solely from historical data in which unobserved factors (confounders) affect both observed actions and rewards. Our formulation allows us to extend a representative RL algorithm, the Actor-Critic method, to its deconfounding variant, with the methodology for this extension being easily applied to other RL algorithms. In addition to this, we develop a new benchmark for evaluating deconfounding RL algorithms by modifying the OpenAI Gym environments and the MNIST dataset. Using this benchmark, we demonstrate that the proposed algorithms are superior to traditional RL methods in confounded environments with observational data. To the best of our knowledge, this is the first time that confounders are taken into consideration for addressing full RL problems with observational data. Code is available at https://github.com/CausalRL/DRL.

Citations (67)

View on Semantic Scholar

Summary

The paper introduces Deconfounding Reinforcement Learning (DRL), a novel method that integrates causal inference to mitigate confounding bias when learning policies solely from observational data.
DRL employs latent-variable models, specifically using VAEs, to simultaneously infer unobserved confounders and their effects, extending the Actor-Critic algorithm for observational settings.
Validated on new confounded benchmarks based on OpenAI Gym and MNIST, DRL demonstrates superior performance over traditional RL algorithms in confounded environments, with implications for healthcare and finance applications.

Deconfounding Reinforcement Learning in Observational Settings

The paper "Deconfounding Reinforcement Learning in Observational Settings" introduces a novel methodology for addressing reinforcement learning (RL) challenges when faced solely with observational data, particularly in situations afflicted by confounders—unobserved factors that simultaneously affect actions and rewards. This approach, termed Deconfounding Reinforcement Learning (DRL), bridges the gap between RL and causal inference to mitigate confounding bias in policy learning.

Methodological Contributions

The foundational premise of DRL rests on integrating a latent-variable model to simultaneously infer latent confounders and their impact on actions and rewards. The authors assume a common confounder, time-independent across episodes, reflecting significant real-world RL applications such as socio-economic status in healthcare or financial strategies. Leveraging variational autoencoders (VAEs), DRL extends the Actor-Critic algorithm to its deconfounding variant, marking a notable contribution to policy optimization solely from observational records.

To validate the DRL approach, the authors develop new benchmarks by modifying conventional OpenAI Gym environments like CartPole and Pendulum, alongside the MNIST dataset modified to include confounders. These benchmarks provide a proving ground demonstrating the superiority of DRL over traditional RL techniques in confounded settings, thus challenging current methodologies.

Results and Implications

Experimental results highlight DRL's proficiency in outperforming traditional RL algorithms in environments afflicted by confounders. DRL effectively captures latent confounders' influence, thereby refining policy actions that align closer with optimal decision-making. Such achievements suggest significant implications for real-world applications in domains where experimental intervention is constrained, notably in healthcare and finance, where decisions based on observational data predominate.

The paper's methodological advancements imply a promising trajectory towards more reliable observational policy learning, while the establishment of confounded benchmarks invites subsequent research endeavors poised to refine RL methodologies further. Researchers are thus encouraged to explore DRL's applicability beyond the presented benchmarks, potentially extending its utility to additional real-world settings characterized by observational data dominance.

Future Prospects

This exploration into deconfounding RL heralds a considerable shift towards integrating causal inference principles into RL frameworks. Future research could expand on these preliminary findings by collaborating with domain experts—particularly in healthcare—to apply DRL in clinical settings with comprehensive observational datasets. Additionally, further exploration into refining causal models and proxies within RL may yield extended frameworks tailored for diverse applications, advancing both theoretical understanding and practical implementation in RL.

In conclusion, the paper showcases substantial advancements in tackling RL challenges in observational settings with confounding bias, offering a robust foundation for bridging causal inference methodologies and reinforcement learning within observational data contexts.

Related Papers

GitHub

GitHub - CausalRL/DRL: Deconfounding Reinforcement Learning in Observational Settings (52 stars)