Papers
Topics
Authors
Recent
Search
2000 character limit reached

Information-Theoretic State Variable Selection for Reinforcement Learning

Published 21 Jan 2024 in cs.LG, cs.AI, cs.IT, and math.IT | (2401.11512v1)

Abstract: Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (101)
  1. Near optimal behavior via approximate state abstraction. In ICML’16, 2016.
  2. Unsupervised State Representation Learning in Atari. NeurIPS’19, 2019.
  3. Partner selection for the emergence of cooperation in multi-agent systems using reinforcement learning. In AAAI’20, 2020.
  4. Dimitris Anastassiou. Computational analysis of the synergy among multiple interacting genes. Molecular Systems Biology, 3(1):83, 2007.
  5. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B, 82(4):1059–1086, 2020.
  6. The evolution of cooperation. Science, 211(4489):1390–1396, 1981.
  7. Learning representations by maximizing mutual information across views. In NeurIPS’21, 2021.
  8. R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4):537–550, 1994.
  9. Mine: Mutual information neural estimation. In ICML’18, 2018.
  10. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(12):1129–59, 1995.
  11. A mathematical theory of adaptive control processes. Proceedings of the National Academy of Sciences, 45(8):1288–1290, 1959.
  12. Shared information – new insights and problems in decomposing information in complex systems. Springer Proceedings in Complexity, 2012.
  13. Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, 34(6):589–598, 1989.
  14. George Blakley. Safeguarding cryptographic keys. In MaRK’79, 1979.
  15. Causal feature selection via transfer entropy. In arXiv:2310.11059, 2023.
  16. Forward-backward selection with early dropping. The Journal of Machine Learning Research, 20(1):276–314, 2019.
  17. Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
  18. OpenAI Gym. In arXiv:1606.01540, 2016.
  19. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. Journal of Machine Learning Research, 13(2):27–66, 2012.
  20. Marginal contribution feature importance - an axiomatic approach for explaining data. In ICML’21, 2021.
  21. True to the model or true to the data? In arXiv:2006.16234, 2020.
  22. Learning to explain: An information-theoretic perspective on model interpretation. In ICML’18, 2018.
  23. Feature selection via coalitional game theory. Neural Computation, 19(8):1939–61, 2007.
  24. Understanding global feature contributions with additive importance measures. In NeurIPS’20, 2020.
  25. Learning to maximize mutual information for dynamic feature selection. In ICML’23, 2023.
  26. Model Minimization in Markov Decision Processes. In AAAI’97, 1997.
  27. Conditional permutation importance revisited. BMC Bioinformatics, 21(1):1–30, 07 2020.
  28. Bert: Pre-training of deep bidirectional transformers for language understanding. In ACL’19, 2019.
  29. Deep spatial autoencoders for visuomotor learning. In ICRA’16, 2016.
  30. Asymmetric Shapley Values: Incorporating Causal Knowledge into Model-Agnostic Explainability. In NeurIPS’20, 2020.
  31. Variational information maximization for feature selection. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, NeurIPS’16, 2016.
  32. Deepmdp: Learning continuous latent space models for representation learning. In ICML’19, 2019.
  33. Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence, 147(1):163–223, 2003.
  34. Clive William John Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3):424–438, 1969.
  35. Quantifying redundant information in predicting a target random variable. Entropy, 17(7):4644–4653, 2015.
  36. Quantifying synergistic mutual information. In arXiv:1205.4265, 2014.
  37. Automatic noise filtering with dynamic sparse training in deep reinforcement learning. In AAMAS’23, 2023.
  38. An introduction to variable and feature selection. Journal of machine learning research, 3(3):1157–1182, 2003.
  39. Sparse feature selection makes batch reinforcement learning more sample efficient. In International Conference on Machine Learning (ICML’21), 2021.
  40. Learning deep representations by mutual information estimation and maximization. In ICLR’19, 2019.
  41. Robert L. Smith James C. Bean, John R. Birge. Aggregation in dynamic programming. Operations Research, 35(2):215–220, 1987.
  42. Application of SHAP values for inferring the optimal functional form of covariates in pharmacokinetic modeling. CPT: Pharmacometrics and Systems Pharmacology, 11(8):1100–1110, 2022.
  43. Ultra-marginal feature importance: Learning from data with causal guarantees. In ICAIS’23, pages 10782–10814, 2023.
  44. Abstraction selection in model-based reinforcement learning. In ICML’15, 2015.
  45. State abstraction discovery from irrelevant state variables. In IJCAI’05, 2005.
  46. Causal localization of neural function: the Shapley value method. Neurocomputing, 58-60(3):215–222, 2004.
  47. Actor-critic algorithms. In NeurIPS’99, 1999.
  48. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79 – 86, 1951.
  49. Problems with Shapley-Value-Based Explanations as Feature Importance Measures. In ICML’20, 2020.
  50. WeightedSHAP: analyzing and improving Shapley based feature attributions. In NeurIPS’22, 2022.
  51. CURL: Contrastive unsupervised representations for reinforcement learning. In ICML’20, 2020.
  52. State representation learning for control: An overview. Neural Networks, 108:379–392, 2018.
  53. Towards a unified theory of state abstraction for mdps. In ISAIM’06, 2006.
  54. Ralph Linsker. Self-organization in a perceptual network. Computer, 21(3):105–117, 1988.
  55. State representation modeling for deep reinforcement learning based recommendation. Knowledge-Based Systems, 205:106170, 2020.
  56. A unified approach to interpreting model predictions. In NeurIPS’17, 2017.
  57. Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, The University of Rochester, 1996.
  58. Human-level control through deep reinforcement learning. Nature, 518(6):529–533, 2015.
  59. Estimation of mutual information using kernel density estimators. Physical Review E, 52(3):2318–2321, 1995.
  60. Near-optimal representation learning for hierarchical reinforcement learning. In ICLR’19, 2019.
  61. The testing of statistical hypotheses in relation to probabilities a priori. Mathematical Proceedings of the Cambridge Philosophical Society, 29(4):492–510, 1933.
  62. Learning state representations for query optimization with deep reinforcement learning. In DEEM’18, 2018.
  63. Milan Paluš. Detecting nonlinearity in multivariate time series. Physics Letters A, 213(3):138–147, 1996.
  64. Virtual to real reinforcement learning for autonomous driving. In arXiv:1704.03952, 2017.
  65. Liam Paninski. Estimation of entropy and mutual information. Neural Computation, 15(6):1191–1253, 2003.
  66. The carbon footprint of machine learning training will plateau, then shrink. Computer, 55(7):18–28, 2022.
  67. Judea Pearl. Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29(3):241–288, 1986.
  68. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12:2825–2830, 2011.
  69. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226–1238, 2005.
  70. Model agnostic supervised local explanations. In NeurIPS’18, 2018.
  71. On variational bounds of mutual information. In ICML’19, 2019.
  72. Iterated prisoner’s dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences, 109(26):10409–10413, 2012.
  73. Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(1):1934–1965, 2019.
  74. Which mutual-information representation learning objectives are sufficient for control? In NeurIPS’21, 2021.
  75. SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes. In IJCAI’03, 2003.
  76. Learning to locomote: Understanding how environment design matters for deep reinforcement learning. In MIG’20, 2020.
  77. ”why should i trust you?”: Explaining the predictions of any classifier. In SIGKDD’16, 2016.
  78. Thomas Schreiber. Measuring information transfer. Physical Review Letters, 85(2):461–464, 2000.
  79. Proximal policy optimization algorithms. In arXiv:1707.06347, 2017.
  80. Data-efficient reinforcement learning with self-predictive representations. In ICLR’20, 2020.
  81. Causal influence detection for improving efficiency in reinforcement learning. In NeurIPS’21, 2021.
  82. Adi Shamir. How to share a secret. Communications of the Association for Computing Machinery, 22(11):612–613, 1979.
  83. Claude E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948.
  84. Lloyd Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28):307–318, 1953.
  85. Learning important features through propagating activation differences. In ICML’17, 2017.
  86. Predictive coding for locally-linear control. In ICML’20, 2020.
  87. Reinforcement learning with soft state aggregation. In NeurIPS’94, 1994.
  88. Understanding the limitations of variational mutual information estimators. In ICLR’20, 2020.
  89. Decoupling representation learning from reinforcement learning. In ICML’21, 2021.
  90. Learning video representations using contrastive bidirectional transformer. In arXiv:1906.05743, 2019.
  91. The Many Shapley Values for Model Explanation. In ICML’20, 2020.
  92. Reinforcement Learning: An Introduction. MIT Press, Cambridge, 2018.
  93. A greedy feature selection algorithm for Big Data of high dimensionality. Machine Learning, 108:149–202, 2019.
  94. Representation learning with contrastive predictive coding. In arXiv:1807.03748, 2019.
  95. Satosi Watanabe. Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development, 4(1):66–82, 1960.
  96. Technical note: Q-learning. Machine Learning, 8:279–292, 1992.
  97. Nonnegative decomposition of multivariate information. In arXiv:1004.2515, 2010.
  98. A rigorous information-theoretic definition of redundancy and relevancy in feature selection based on (partial) information decomposition. Journal of Machine Learning Research, 24(131):1–44, 2023.
  99. Regularized feature selection in reinforcement learning. Machine Learning, 100(2-3):655–676, 2015.
  100. Feature selection using stochastic gates. In ICML’20, 2020.
  101. Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound. In ICML’20, 2020.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.