Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reward-Relevance-Filtered Linear Offline Reinforcement Learning (2401.12934v1)

Published 23 Jan 2024 in stat.ML, cs.LG, and math.OC

Abstract: This paper studies offline reinforcement learning with linear function approximation in a setting with decision-theoretic, but not estimation sparsity. The structural restrictions of the data-generating process presume that the transitions factor into a sparse component that affects the reward and could affect additional exogenous dynamics that do not affect the reward. Although the minimally sufficient adjustment set for estimation of full-state transition properties depends on the whole state, the optimal policy and therefore state-action value function depends only on the sparse component: we call this causal/decision-theoretic sparsity. We develop a method for reward-filtering the estimation of the state-action value function to the sparse component by a modification of thresholded lasso in least-squares policy evaluation. We provide theoretical guarantees for our reward-filtered linear fitted-Q-iteration, with sample complexity depending only on the size of the sparse component.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Thresholded lasso bandit. In International Conference on Machine Learning, pages 878–928. PMLR, 2022.
  2. A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse models. Bernoulli, 2013.
  3. P. Bühlmann and S. Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
  4. J. Chen and N. Jiang. Information-theoretic considerations in batch reinforcement learning. In International Conference on Machine Learning, pages 1042–1051. PMLR, 2019.
  5. Discovering and removing exogenous state variables and rewards for reinforcement learning. In International Conference on Machine Learning, pages 1262–1270. PMLR, 2018.
  6. Minimax-optimal off-policy evaluation with linear function approximation. In International Conference on Machine Learning, pages 2701–2709. PMLR, 2020.
  7. Risk bounds and rademacher complexity in batch reinforcement learning. In International Conference on Machine Learning, pages 2892–2902. PMLR, 2021.
  8. Provably filtering exogenous distractors using multistep inverse dynamics. In International Conference on Learning Representations, 2021.
  9. Sparsity in partially controllable linear systems. In International Conference on Machine Learning, pages 5851–5860. PMLR, 2022.
  10. Clinical data based optimal sti strategies for hiv: a reinforcement learning approach. In Proceedings of the 45th IEEE Conference on Decision and Control, pages 667–672. IEEE, 2006.
  11. Sparse feature selection makes batch reinforcement learning more sample efficient. In International Conference on Machine Learning, pages 4063–4073. PMLR, 2021.
  12. Random design analysis of ridge regression. pages 9–1, 2012.
  13. Guaranteed discovery of control-endogenous latent states with multi-step inverse models. Transactions on Machine Learning Research, 2022.
  14. Batch policy learning under constraints. In International Conference on Machine Learning, pages 3703–3712. PMLR, 2019.
  15. Oracle inequalities for model selection in offline reinforcement learning. Advances in Neural Information Processing Systems, 35:28194–28207, 2022.
  16. N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high-dimensional data. 2009.
  17. A. Nedić and D. P. Bertsekas. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13(1-2):79–110, 2003.
  18. Minimax rates of estimation for high-dimensional linear regression over ℓqsubscriptℓ𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-balls. IEEE transactions on information theory, 57(10):6976–6994, 2011.
  19. Causal influence detection for improving efficiency in reinforcement learning. Advances in Neural Information Processing Systems, 34, 2021.
  20. S. M. Shortreed and A. Ertefaie. Outcome-adaptive lasso: variable selection for causal inference. Biometrics, 73(4):1111–1122, 2017.
  21. The adaptive and the thresholded lasso for potentially misspecified models (and a lower bound for the lasso). 2011.
  22. Denoised mdps: Learning world models better than the world itself. arXiv preprint arXiv:2206.15477, 2022a.
  23. Task-independent causal state abstraction.
  24. Causal dynamics learning for task-independent state abstraction. arXiv preprint arXiv:2206.13452, 2022b.
  25. Invariant causal prediction for block mdps. In International Conference on Machine Learning, pages 11214–11224. PMLR, 2020.
  26. S. Zhou. Thresholding procedures for high dimensional variable selection and statistical estimation. Advances in Neural Information Processing Systems, 22, 2009.
  27. S. Zhou. Thresholded lasso for high dimensional variable selection and statistical estimation. arXiv preprint arXiv:1002.1583, 2010.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com