Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 81 tok/s Pro
Kimi K2 175 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

InfoCon: Concept Discovery with Generative and Discriminative Informativeness (2404.10606v1)

Published 14 Mar 2024 in cs.RO, cs.AI, and cs.LG

Abstract: We focus on the self-supervised discovery of manipulation concepts that can be adapted and reassembled to address various robotic tasks. We propose that the decision to conceptualize a physical procedure should not depend on how we name it (semantics) but rather on the significance of the informativeness in its representation regarding the low-level physical state and state changes. We model manipulation concepts (discrete symbols) as generative and discriminative goals and derive metrics that can autonomously link them to meaningful sub-trajectories from noisy, unlabeled demonstrations. Specifically, we employ a trainable codebook containing encodings (concepts) capable of synthesizing the end-state of a sub-trajectory given the current state (generative informativeness). Moreover, the encoding corresponding to a particular sub-trajectory should differentiate the state within and outside it and confidently predict the subsequent action based on the gradient of its discriminative score (discriminative informativeness). These metrics, which do not rely on human annotation, can be seamlessly integrated into a VQ-VAE framework, enabling the partitioning of demonstrations into semantically consistent sub-trajectories, fulfilling the purpose of discovering manipulation concepts and the corresponding sub-goal (key) states. We evaluate the effectiveness of the learned concepts by training policies that utilize them as guidance, demonstrating superior performance compared to other baselines. Additionally, our discovered manipulation concepts compare favorably to human-annotated ones while saving much manual effort.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
  2. A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
  3. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pp.  287–318. PMLR, 2023.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Adaptive gaussian process change point detection. In International Conference on Machine Learning, pp. 2542–2571. PMLR, 2022.
  6. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  7. Binding language models in symbolic languages. arXiv preprint arXiv:2210.02875, 2022.
  8. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
  9. Hypernetworks. arXiv preprint arXiv, 1609, 2016.
  10. Towards a unified agent with foundation models. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023.
  11. Survey of imitation learning for robotic manipulation. International Journal of Intelligent Robotics and Applications, 3:362–369, 2019.
  12. Variational intrinsic control. arXiv preprint arXiv:1611.07507, 2016.
  13. Maniskill2: A unified benchmark for generalizable manipulation skills. arXiv preprint arXiv:2302.04659, 2023.
  14. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations, 2018.
  15. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  16. Grounded decoding: Guiding text generation with grounded models for robot control. arXiv preprint arXiv:2303.00855, 2023.
  17. Hierarchical reinforcement learning: A survey and open research challenges. Machine Learning and Knowledge Extraction, 4(1):172–221, 2022.
  18. Time-agnostic prediction: Predicting predictable video frames. arXiv preprint arXiv:1808.07784, 2018.
  19. Chain-of-thought predictive control. arXiv preprint arXiv:2304.00776, 2023.
  20. Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Science Robotics, 4(26):eaav3150, 2019.
  21. Mopro: Webly supervised learning with momentum prototypes. arXiv preprint arXiv:2009.07995, 2020.
  22. Sit to talk: Relation between motor skills and language development in infancy. Frontiers in psychology, 7:475, 2016.
  23. Masked autoencoding for scalable and generalizable decision making. Advances in Neural Information Processing Systems, 35:12608–12618, 2022.
  24. Towards generalized manipulation learning through grasp mechanics-based features and self-supervision. IEEE Transactions on Robotics, 37(5):1553–1569, 2021. doi: 10.1109/TRO.2021.3057802.
  25. Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations. arXiv preprint arXiv:2107.14483, 2021.
  26. Adaptive skip intervals: Temporal abstraction for recurrent dynamical models. Advances in Neural Information Processing Systems, 31, 2018.
  27. Imitating human behaviour with diffusion models. arXiv preprint arXiv:2301.10677, 2023.
  28. Keyframing the future: Keyframe discovery for visual prediction and planning. In Learning for Dynamics and Control, pp.  969–979. PMLR, 2020.
  29. From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation. IEEE Robotics and Automation Letters, 7(4):10873–10881, 2022.
  30. Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. In 2018 IEEE international conference on robotics and automation (ICRA), pp.  3758–3765. IEEE, 2018.
  31. Theory and experiments on vector quantized autoencoders. arXiv preprint arXiv:1805.11063, 2018.
  32. Time-contrastive networks: Self-supervised learning from video. In 2018 IEEE international conference on robotics and automation (ICRA), pp.  1134–1141. IEEE, 2018.
  33. Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. Advances in neural information processing systems, 35:22955–22968, 2022.
  34. Learning robot skills with temporal variational inference. In International Conference on Machine Learning, pp. 8624–8633. PMLR, 2020.
  35. Concept2robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research, 40(12-14):1419–1434, 2021.
  36. Waypoint-based imitation learning for robotic manipulation. arXiv preprint arXiv:2307.14326, 2023.
  37. Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pp.  894–906. PMLR, 2022.
  38. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  11523–11530. IEEE, 2023.
  39. Vladimir M Sloutsky. From perceptual categories to concepts: What develops? Cognitive science, 34(7):1244–1286, 2010.
  40. Skid raw: Skill discovery from raw trajectories. IEEE robotics and automation letters, 6(3):4696–4703, 2021.
  41. Behavioral cloning from observation, 2018.
  42. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  43. Self-supervised learning of multi-object keypoints for robotic manipulation. arXiv preprint arXiv:2205.08316, 2022.
  44. Infant language development is related to the acquisition of walking. Developmental psychology, 50(2):336, 2014.
  45. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  46. Towards learning geometric eigen-lengths crucial for fitting tasks. In Proceedings of the 40th International Conference on Machine Learning, pp.  36958–36977, 2023.
  47. Neural task programming: Learning to generalize across hierarchical tasks. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp.  3795–3802. IEEE, 2018.
  48. Xskill: Cross embodiment skill discovery. arXiv preprint arXiv:2307.09955, 2023.
  49. Self-supervised learning of state estimation for manipulating deformable linear objects. IEEE robotics and automation letters, 5(2):2372–2379, 2020.
  50. Chain of thought imitation with procedure cloning. Advances in Neural Information Processing Systems, 35:36366–36381, 2022.
  51. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  52. Learning rich touch representations through cross-modal self-supervision. In Jens Kober, Fabio Ramos, and Claire Tomlin (eds.), Proceedings of the 2020 Conference on Robot Learning, volume 155 of Proceedings of Machine Learning Research, pp.  1415–1425. PMLR, 16–18 Nov 2021. URL https://proceedings.mlr.press/v155/zambelli21a.html.
  53. Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, pp.  726–747. PMLR, 2021.
  54. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp.  5628–5635. IEEE, 2018.
  55. Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation. IEEE Robotics and Automation Letters, 7(2):4126–4133, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube