Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference (2403.04082v3)
Abstract: Given time series data, how can we answer questions like "what will happen in the future?" and "how did we get here?" These sorts of probabilistic inference questions are challenging when observations are high-dimensional. In this paper, we show how these questions can have compact, closed form solutions in terms of learned representations. The key idea is to apply a variant of contrastive learning to time series data. Prior work already shows that the representations learned by contrastive learning encode a probability ratio. By extending prior work to show that the marginal distribution over representations is Gaussian, we can then prove that joint distribution of representations is also Gaussian. Taken together, these results show that representations learned via temporal contrastive learning follow a Gauss-Markov chain, a graphical model where inference (e.g., prediction, planning) over representations corresponds to inverting a low-dimensional matrix. In one special case, inferring intermediate representations will be equivalent to interpolating between the learned representations. We validate our theory using numerical simulations on tasks up to 46-dimensions.
- Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, pages 10–4.
- Analogies explained: Towards understanding word embeddings. In International Conference on Machine Learning, pages 223–231. PMLR.
- Allen, C. S. (2021). Learning markov state abstractions for deep reinforcement learning. In Neural Information Processing Systems.
- Unsupervised state representation learning in atari. Advances in neural information processing systems, 32.
- A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4:385–399.
- Attias, H. (2003). Planning by Probabilistic Inference. In International Workshop on Artificial Intelligence and Statistics, pages 9–16. PMLR.
- Robust locally-linear controllable embedding. In International Conference on Artificial Intelligence and Statistics, pages 1751–1759. PMLR.
- Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30.
- Information prioritization through empowerment in visual model-based rl. In International Conference on Learning Representations.
- Planning as inference. Trends in cognitive sciences, 16(10):485–488.
- Time series analysis: forecasting and control. John Wiley & Sons.
- UniMASK: Unified Inference in Sequential Decision Problems.
- Mico: Improved representations via sampling-based state similarity for markov decision processes. In Neural Information Processing Systems.
- Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning, pages 1430–1440. PMLR.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758.
- Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2403–2411, Long Beach, CA, USA. IEEE.
- A Recurrent Latent Variable Model for Sequential Data. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.
- Intrinsically motivated goal-conditioned reinforcement learning: a short survey.
- Conrad, K. (2010). Probability distributions and maximum likelihood.
- Control-aware representations for model-based reinforcement learning. In International Conference on Learning Representations.
- Dayan, P. (1993). Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5:613–624.
- Curious representation learning for embodied intelligence. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 10388–10397.
- Adversarially learned inference. In International Conference on Learning Representations.
- C-learning: Learning to achieve goals via recursive classification. In International Conference on Learning Representations.
- Search on the Replay Buffer: Bridging Planning and Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- Contrastive learning as goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems, 35:35603–35620.
- Generalization with lossy affordances: Leveraging broad offline data for learning visuomotor tasks. In Conference on Robot Learning, pages 106–117. PMLR.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219.
- Simplifying model-based rl: Learning representations, latent-space models, and policies with one objective. In The Eleventh International Conference on Learning Representations.
- Learning to linearize under uncertainty. Advances in neural information processing systems, 28.
- Byol-explore: Exploration by bootstrapped prediction. Advances in neural information processing systems, 35:31855–31870.
- Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 297–304. JMLR Workshop and Conference Proceedings.
- Word embeddings as metric recovery in semantic spaces. Transactions of the Association for Computational Linguistics, 4:273–286.
- Distance weighted supervised learning for offline interaction data. arXiv preprint arXiv:2304.13774.
- Higham, N. (2022). What is the second difference matrix? https://nhigham.com/2022/01/31/what-is-the-second-difference-matrix/.
- Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286.
- Slow and steady feature analysis: higher order temporal coherence in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3852–3861.
- Jaynes, E. T. (1957). Information Theory and Statistical Mechanics. Physical Review, 106(4):620–630.
- Nudged elastic band method for finding minimum energy paths of transitions. In Classical and quantum dynamics in condensed phase simulations, pages 385–404. World Scientific.
- Language-Driven Representation Learning for Robotics.
- Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE transactions on Robotics and Automation, 12(4):566–580.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186.
- Soar: An architecture for general intelligence. Artificial intelligence, 33(1):1–64.
- Rapidly-exploring random trees: Progress and prospects. Algorithmic and computational robotics: new directions, 5:293–308.
- Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on computational natural language learning, pages 171–180.
- Metric residual network for sample efficient goal-conditioned reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 8799–8806.
- Data Augmentation via Latent Space Interpolation for Image Classification. In 2018 24th International Conference on Pattern Recognition (ICPR), pages 728–733.
- LIV: Language-Image Representations and Rewards for Robotic Control.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. In The Eleventh International Conference on Learning Representations.
- How far i’ll go: Offline goal-conditioned reinforcement learning via f𝑓fitalic_f-advantage regression. arXiv preprint arXiv:2206.03023.
- Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency. In Conference on Empirical Methods in Natural Language Processing.
- The apache point observatory galactic evolution experiment (apogee). The Astronomical Journal, 154(3):94.
- Adversarial autoencoders. arXiv preprint arXiv:1511.05644.
- Walk-sums and belief propagation in gaussian graphical models. The Journal of Machine Learning Research, 7:2031–2064.
- Contrastive value learning: Implicit models for simple offline rl. In Conference on Robot Learning, pages 1257–1267. PMLR.
- Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
- R3m: A universal visual representation for robot manipulation. In Conference on Robot Learning, pages 892–909. PMLR.
- Report on a general problem solving program. In IFIP congress, volume 256, page 64. Pittsburgh, PA.
- The evaluation of matrix inversion programs. Journal of the Society for Industrial and Applied Mathematics, 6(4):466–476.
- Non-markovian predictive coding for planning in latent space.
- Temporal predictive coding for model-based planning in latent space. In International Conference on Machine Learning, pages 8130–8139. PMLR.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
- OpenAI (2023). Gpt-4 technical report. ArXiv, abs/2303.08774.
- Autoencoder image interpolation by shaping the latent space. In International Conference on Machine Learning, pages 8281–8290. PMLR.
- Finetuning Pretrained Transformers into Variational Autoencoders.
- On variational bounds of mutual information. In International Conference on Machine Learning, pages 5171–5180. PMLR.
- Spatiotemporal Contrastive Video Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6964–6974.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
- Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems, 32.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695.
- A comparison of single-cell trajectory inference methods. Nature biotechnology, 37(5):547–554.
- A theoretical analysis of contrastive unsupervised representation learning. In International Conference on Machine Learning, pages 5628–5637. PMLR.
- Universal value density estimation for imitation learning and goal-conditioned reinforcement learning. arXiv preprint arXiv:2002.06473.
- Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations.
- Time-contrastive networks: Self-supervised learning from video. In 2018 IEEE international conference on robotics and automation (ICRA), pages 1134–1141. IEEE.
- Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3):379–423.
- Predictive coding for locally-linear control. In International Conference on Machine Learning, pages 8862–8871. PMLR.
- Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems, 29.
- Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning, pages 9870–9879. PMLR.
- Understanding self-predictive learning for reinforcement learning. In International Conference on Machine Learning, pages 33632–33656. PMLR.
- A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research, 11:3137–3181.
- Path integral control and state-dependent feedback. Physical Review E, 91(3):032104.
- Model-based visual planning with self-supervised functional distances. In International Conference on Learning Representations.
- Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 776–794. Springer.
- On the theory of the brownian motion. Physical review, 36(5):823.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929–9939. PMLR.
- Optimal goal-reaching reinforcement learning via quasimetric learning. In International Conference on Machine Learning. PMLR.
- Embed to control: A locally linear latent dynamics model for control from raw images. Advances in neural information processing systems, 28.
- Correctness of belief propagation in gaussian graphical models of arbitrary topology. Advances in neural information processing systems, 12.
- Model Predictive Path Integral Control using Covariance Variable Importance Sampling.
- Slow Feature Analysis: Unsupervised Learning of Invariances. Neural Computation, 14(4):715–770.
- Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3733–3742.
- Xskill: Cross embodiment skill discovery. In Conference on Robot Learning, pages 3536–3555. PMLR.
- Semantics-Guided Representation Learning with Applications to Visual Synthesis. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 7181–7187.
- Rethinking goal-conditioned supervised learning and its connection to offline rl. In International Conference on Learning Representations.
- C-planning: An automatic curriculum for learning goal-reaching tasks. In International Conference on Learning Representations.
- Towards deeper understanding of variational autoencoding models. arXiv preprint arXiv:1702.08658.
- Stabilizing contrastive RL: Techniques for robotic goal reaching from offline data. In The Twelfth International Conference on Learning Representations.
- S3vae: Self-supervised sequential vae for representation disentanglement and data generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6538–6547.
- S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation.