Learning to Poke by Poking: Experiential Learning of Intuitive Physics (1606.07419v2)

Published 23 Jun 2016 in cs.CV, cs.AI, and cs.RO

Abstract: We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 100K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods.

Citations (553)

View on Semantic Scholar

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper introduces a dual-model approach where a Baxter robot learns intuitive physics by predicting object dynamics through over 100,000 pokes.
It employs joint forward and inverse deep neural networks in a high-level feature space to enhance generalization and simplify training.
Experimental evaluations demonstrate superior performance over baselines in long-distance, multi-step object manipulation tasks.

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

The paper "Learning to Poke by Poking: Experiential Learning of Intuitive Physics" by Agrawal et al. presents a novel approach to developing an internal model of intuitive physics through experiential learning. This research highlights a framework where a robotic agent gains the capacity for intuitive understanding of physical interactions by directly engaging in manipulative tasks.

Core Contributions

The primary contribution of this research is the introduction of a system where a robot learns to predict object dynamics by poking them, thereby acquiring a model of intuitive physics. The robot employed in this paper, a Baxter robot, engaged in over 400 hours of interaction, performing more than 100,000 pokes on 16 distinct objects. The paper advances a dual-model approach utilizing deep neural networks to predict the dynamics (forward model) and ascertain the necessary actions (inverse model) required to achieve desired outcomes.

Methodology

The paper details a mechanism where the robot uses visual feedback to develop its understanding:

Data Collection: The robot autonomously interacts with objects, employing a Kinect camera to record the visual states before and after pokes. Data were collected using random poking strategies refined by depth data to ensure productive interactions.
Model Architecture: The proposed architecture involves joint training of forward and inverse dynamics models in a high-level feature space rather than a pixel space. This approach reduces complexity and enhances the generalization capacity of the model. The inverse model aids in feature extraction, providing essential supervision, while the forward model regularizes these features.

Experimental Evaluation

The efficacy of the learned model is assessed by equipping the robot with tasks that require it to manipulate objects into specified configurations, even when these tasks involve previously unseen objects or setups:

Generalization: The model was demonstrated to generalize beyond the training data, able to manipulate objects over significantly greater distances and through more complex dynamics than initially experienced.
Robust Performance: Comparative evaluations against baseline models revealed superior performance of the joint model, especially in multi-step and long-distance tasks.

Numerical Results

Quantitative comparisons indicate that the joint model achieves a lower error rate in displacing objects to desired poses than models trained independently. Moreover, simulation studies highlight that the joint model provides beneficial regularization, which is particularly evident with smaller training sets where it outperforms the inverse-only model.

Theoretical and Practical Implications

Theoretically, this work supports the hypothesis that models combining forward and inverse learning provide advantageous regularizations enhancing predictive capabilities. Practically, it proposes a scalable approach to autonomous learning in robotics, potentially extending to various manipulation tasks beyond simple poking, such as more intricate object arrangements or non-manipulative tasks like navigation.

Future Developments

The paper opens avenues for further exploration in robotic learning and interaction, suggesting enhancements in action planning strategies and employing continuous control. The robust yet adaptable framework paves the way for incorporating more complex environments and novel task domains, thereby expanding the horizon for autonomous robotic systems.

In conclusion, this work establishes a foundational methodology for experiential learning of intuitive physics, positioning the robot not only as a reactive executor of predetermined tasks but as a proactive learner capable of adapting its strategies to unstructured and dynamic environments. This advances the broader objective of developing intelligent systems that mimic human-like understanding and interaction with the physical world.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now