Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Information Theoretic Approach to Interaction-Grounded Learning (2401.05015v2)

Published 10 Jan 2024 in cs.LG

Abstract: Reinforcement learning (RL) problems where the learner attempts to infer an unobserved reward from some feedback variables have been studied in several papers. The setting of Interaction-Grounded Learning (IGL) is an example of such feedback-based RL tasks where the learner optimizes the return by inferring latent binary rewards from the interaction with the environment. In the IGL setting, a relevant assumption used in the RL literature is that the feedback variable $Y$ is conditionally independent of the context-action $(X,A)$ given the latent reward $R$. In this work, we propose Variational Information-based IGL (VI-IGL) as an information-theoretic method to enforce the conditional independence assumption in the IGL-based RL problem. The VI-IGL framework learns a reward decoder using an information-based objective based on the conditional mutual information (MI) between $(X,A)$ and $Y$. To estimate and optimize the information-based terms for the continuous random variables in the RL problem, VI-IGL leverages the variational representation of mutual information to obtain a min-max optimization problem. Also, we extend the VI-IGL framework to general $f$-Information measures leading to the generalized $f$-VI-IGL framework for the IGL-based RL problems. We present numerical results on several reinforcement learning settings indicating an improved performance compared to the existing IGL-based RL algorithm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society. Series B (Methodological), 28(1):131–142.
  2. Mutual information neural estimation. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 531–540. PMLR.
  3. Emnist: an extension of mnist to handwritten letters.
  4. Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observation. studia scientiarum Mathematicarum Hungarica, 2:229–318.
  5. Asymptotic evaluation of certain markov process expectations for large time. iv. Communications on Pure and Applied Mathematics, 36(2):183–212.
  6. Doubly robust policy evaluation and optimization. Statistical Science, 29(4).
  7. Efficient optimal learning for contextual bandits.
  8. Efficient Estimation of Mutual Information for Strongly Dependent Variables. In Lebanon, G. and Vishwanathan, S. V. N., editors, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, volume 38 of Proceedings of Machine Learning Research, pages 277–286, San Diego, California, USA. PMLR.
  9. Variational intrinsic control.
  10. Kullback, S. (1997). Information theory and statistics. Courier Corporation.
  11. The epoch-greedy algorithm for multi-armed bandits with side information. In Platt, J., Koller, D., Singer, Y., and Roweis, S., editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc.
  12. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.
  13. Nonlinear inverse reinforcement learning with gaussian processes. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc.
  14. Personalized reward learning with interaction-grounded learning (IGL). In The Eleventh International Conference on Learning Representations.
  15. Conditional mutual information neural estimator. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5025–5029.
  16. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, page 663–670, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  17. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theor., 56(11):5847–5861.
  18. Paninski, L. (2003). Estimation of entropy and mutual information. Neural Comput., 15(6):1191–1253.
  19. Peason, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302):157–175.
  20. Learning to optimize via information-directed sampling. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc.
  21. Bci2000: a general-purpose brain-computer interface (bci) system. IEEE Transactions on Biomedical Engineering, 51(6):1034–1043.
  22. Toward incorporating bio-signals in online education case of assessing student attention with bci. In Rocha, Á., Serrhini, M., and Felgueiras, C., editors, Europe and MENA Cooperation Advances in Information and Communication Technologies, pages 135–146, Cham. Springer International Publishing.
  23. Reinforcement learning: An introduction. MIT press.
  24. The information bottleneck method.
  25. Policy finetuning: Bridging sample-efficient offline and online reinforcement learning. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems.
  26. Interaction-grounded learning. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 11414–11423. PMLR.
  27. Interaction-grounded learning with action-inclusive feedback. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors, Advances in Neural Information Processing Systems.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com