A Kernel Perspective on Behavioural Metrics for Markov Decision Processes (2310.19804v1)
Abstract: Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.
- Contrastive behavioral similarity embeddings for generalization in reinforcement learning. In In Proceedings of the Ninth International Conference on Learning Representations, 2021a.
- Deep reinforcement learning at the edge of the statistical precipice. In Advances in Neural Information Processing Systems, 2021b.
- Vector-valued Laplace transforms and Cauchy problems. Springer-Verlag, Jan 2001.
- Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American mathematical society, 68(3):337–404, 1950.
- Leemon C. Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the International Conference on Machine Learning, 1995.
- Practical kernel-based reinforcement learning. The Journal of Machine Learning Research, 17(1):2372–2441, 2016.
- The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013.
- A geometric perspective on optimal representations for reinforcement learning. In Advances in Neural Information Processing Systems, 2019.
- Neuro-dynamic programming. Athena Scientific, 1996.
- Bisimulation for labelled Markov processes. In Proceedings of IEEE Symposium On Logic In Computer Science, 1997.
- Jax: composable transformations of python+ numpy programs. 2018.
- Pablo Samuel Castro. Scalable methods for computing state similarity in deterministic Markov decision processes. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
- Dopamine: A Research Framework for Deep Reinforcement Learning. arXiv, 2018.
- MICo: Learning improved representations via sampling-based state similarity for Markov decision processes. In Advances in Neural Information Processing Systems, 2021.
- On-the-fly algorithms for bisimulation metrics. In Proceedings of the International Conference on Quantitative Evaluation of Systems, 2012.
- Implicit quantile networks for distributional reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 1096–1105. PMLR, 2018a.
- Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018b.
- Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613–624, 1993.
- Bisimulation for labeled Markov processes. Information and Computation, 179(2):163–193, Dec 2002.
- Metrics for labeled Markov systems. In Proceedings of the International Conference on Concurrency Theory, 1999.
- Geometry of cuts and metrics, volume 2. Springer, 1997.
- Kernel-based reinforcement learning: A finite-time analysis. In Proceedings of the International Conference on Machine Learning, 2021.
- Regularized policy iteration with nonparametric function spaces. The Journal of Machine Learning Research, 17:1–66, January 2016.
- Metrics for finite Markov decision processes. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2004.
- Methods for computing state similarity in Markov decision processes. In Conference on Uncertainty in Artificial Intelligence (UAI), 2006.
- Bisimulation metrics for continuous Markov decision processes. SIAM Journal on Computing, 40(6):1662–1714, 2011.
- Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv, 2015.
- DeepMDP: Learning continuous latent space models for representation learning. In Proceedings of the International Conference on Machine Learning, 2019.
- Corrado Gini. Variabilità e mutabilità: contributo allo studio delle distribuzioni e delle relazioni statistiche. Studi economico-giuridici pubblicati per cura della facoltà di Giurisprudenza della R. Università di Cagliari. Tipogr. di P. Cuppini, 1912.
- Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence, 147(1-2):163–223, 2003.
- A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012.
- C. Guilbart. Produits scalaires sur l’espace des mesures. Annales de l’I.H.P. Probabilités et statistiques, 15(4):333–354, 1979.
- Learning latent dynamics for planning from pixels. In Proceedings of the International Conference on Machine Learning, 2019.
- Bisimulation makes analogies in goal-conditioned reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2022.
- Array programming with numpy. Nature, 585(7825):357–362, 2020.
- Rainbow: Combining Improvements in Deep Reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
- John D Hunter. Matplotlib: A 2d graphics environment. Computing in science & engineering, 9(03):90–95, 2007.
- Reinforcement learning with unsupervised auxiliary tasks. In Proceedings of the International Conference on Learning Representations, 2017.
- Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26:189–206, 01 1984.
- On a space of totally additive functions. Vestnik Leningrad. Univ, 13(7):52–59, 1958.
- Towards robust bisimulation metric learning. Advances in Neural Information Processing Systems, 2021.
- Approximate policy iteration with bisimulation metrics. Transactions on Machine Learning Research, 2022.
- Value function approximation in reinforcement learning using the Fourier basis. In Proceedings of the AAAI Conference on Artificial Intelligence, 2011.
- Policy evaluation in continuous mdps with efficient kernelized gradient temporal difference. IEEE Trans. Autom. Control., 66(4):1856–1863, 2021.
- Deep auto-encoder neural networks in reinforcement learning. In Proceedings of the International Joint Conference on Neural Networks, 2010.
- Bisimulation through probablistic testing. Information and Computation, 94:1–28, 1991.
- Metrics and continuity in reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
- Compressed conditional mean embeddings for model-based reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
- Adaptive auxiliary task weighting for reinforcement learning. In Advances in Neural Information Processing Systems, 2019.
- Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 8:2169–2231, Dec 2007.
- Jiri Matousek. Lectures on discrete geometry, volume 212. Springer Science & Business Media, 2013.
- Stephen G. Matthews. Partial metric topology. Annals of the New York Academy of Sciences, 728(1):183–197, 1994.
- Robert Milner. A Calculus for Communicating Systems, volume 92 of Lecture Notes in Computer Science. Springer-Verlag, 1980.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Alfred Müller. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2):429–443, 1997. ISSN 00018678.
- Travis E. Oliphant. Python for scientific computing. Computing in Science & Engineering, 9(3):10–20, 2007. doi: 10.1109/MCSE.2007.58.
- Kernel-based reinforcement learning. Machine Learning, 49(2–3):161–178, 2002.
- Prakash Panangaden. Labelled Markov processes. Imperial College Press, 2009.
- David Park. Title unknown. Slides for Bad Honnef Workshop on Semantics of Concurrency, 1981.
- Computational optimal transport. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- The method of distances in the theory of probability and statistics. Springer-Verlag, 2013.
- Frigyes Riesz. Sur une espèce de Géométrie analytique des systèmes de fonctions sommables. Gauthier-Villars, 1907.
- A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3):400 – 407, 1951.
- Walter Rudin. Functional Analysis. Tata McGraw-Hill, 1974.
- Isaac J. Schoenberg. Remarks to Maurice Fréchet’s article“Sur la definition axiomatique d’une classe d’espace distances vectoriellement applicable sur l’espace de Hilbert”. Annals of Mathematics, pp. 724–732, 1935.
- Learning with kernels: Support vector machines, regularization, optimization, and beyond. The MIT Press, 2018.
- Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 41(5):2263–2291, 2013.
- Loss is its own reward: Self-supervision for reinforcement learning. In Proceedings of the International Conference on Learning Representations (Workshop Track), 2017.
- Reinforcement Learning: An Introduction. The MIT Press, 2018.
- DeepMind control suite. arXiv, 2018.
- Franck van Breugel and James Worrell. Towards quantitative verification of probabilistic systems. In Proceedings of the International Colloquium on Automata, Languages and Programming, July 2001.
- Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
- Munchausen reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS 2020), 2020.
- Cédric Villani. Optimal transport: Old and new, volume 338. Springer Science & Business Media, 2008.
- Provably efficient reinforcement learning with kernel and neural function approximations. In Advances in Neural Information Processing Systems, 2020.
- Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
- Shlomo Yitzhaki. Gini’s mean difference: A superior measure of variability for non-normal distributions. Metron - International Journal of Statistics, LXI(2):285–316, 2003.
- Invariant representations for reinforcement learning without reconstruction. In Proceedings of the International Conference on Learning Representations, 2021.
- Szymon Łukaszyk. A new concept of probability metric and its applications in approximation of scattered data sets. Computational Mechanics, 33:299–304, 03 2004.