Combining Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning in Robotics (2201.10266v1)

Published 25 Jan 2022 in cs.AI, cs.CV, cs.LG, cs.LO, and cs.RO

Abstract: Algorithms based on deep network models are being used for many pattern recognition and decision-making tasks in robotics and AI. Training these models requires a large labeled dataset and considerable computational resources, which are not readily available in many domains. Also, it is difficult to explore the internal representations and reasoning mechanisms of these models. As a step towards addressing the underlying knowledge representation, reasoning, and learning challenges, the architecture described in this paper draws inspiration from research in cognitive systems. As a motivating example, we consider an assistive robot trying to reduce clutter in any given scene by reasoning about the occlusion of objects and stability of object configurations in an image of the scene. In this context, our architecture incrementally learns and revises a grounding of the spatial relations between objects and uses this grounding to extract spatial information from input images. Non-monotonic logical reasoning with this information and incomplete commonsense domain knowledge is used to make decisions about stability and occlusion. For images that cannot be processed by such reasoning, regions relevant to the tasks at hand are automatically identified and used to train deep network models to make the desired decisions. Image regions used to train the deep networks are also used to incrementally acquire previously unknown state constraints that are merged with the existing knowledge for subsequent reasoning. Experimental evaluation performed using simulated and real-world images indicates that in comparison with baselines based just on deep networks, our architecture improves reliability of decision making and reduces the effort involved in training data-driven deep network models.

Authors (2)

Mohan Sridharan (30 papers)
Tiago Mota (3 papers)

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that integrating commonsense reasoning with deep learning improves robotics decision-making in data-sparse environments.
It introduces a hybrid spatial grounding approach, combining qualitative and metric representations refined through human feedback.
The system leverages ASP-based reasoning and attention mechanisms to incrementally learn and merge axioms for robust task performance.

Combining Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning in Robotics

The paper delineates an architecture that integrates cognitive principles—focusing notably on commonsense reasoning, knowledge acquisition, and inductive learning—with deep learning to bolster the decision-making capabilities in robotics. This architecture aims to address the challenges where large labeled datasets and computational resources are not readily available, particularly targeting significant advantages in tasks within assistive robotics.

Core Contributions

The architecture exhibits several distinct components:

Incremental Grounding of Spatial Relations: The architecture employs a hybrid grounding approach that merges Qualitative Spatial Representation (QSR) and Metric Spatial Representation (MSR). The QSR provides an initial, manually encoded generic grounding while the MSR incrementally refines this grounding using observations and limited human feedback. The QSR-based grounding assists in extracting initial spatial relations, which are subsequently refined by the MSR module.
Knowledge Representation and Reasoning: The system utilizes the action language $\mathcal{AL}$ to describe dynamic domains, translating these descriptions to Answer Set Prolog (ASP) programs for robust non-monotonic logical reasoning. This formulation enables the representation of incomplete knowledge and supports the automated reasoning required for the estimation tasks.
Attention Mechanism and Deep Learning: When ASP-based reasoning fails to label objects accurately, the architecture employs an attention mechanism to automatically identify regions of interest (ROIs). These regions, which are pertinent to the decision-making tasks, are used to train convolutional neural networks (CNNs), such as Lenet and AlexNet, focusing on the specific subtasks of stability and occlusion estimation rather than the entire image.
Decision Tree Induction and Axiom Merging: The system incrementally learns previously unknown axioms by leveraging decision tree induction on data extracted from image ROIs used to train CNNs. These axioms are validated and merged with existing domain knowledge, including a heuristic approach inspired by human forgetting to update and prune the axioms over time.

Experimental Validation

The architecture was empirically evaluated using real-world and simulated datasets:

Incremental Grounding: Using the Table Object Scene Database (TOSD), the hybrid grounding model showed marked improvement in the accuracy of spatial relation labels over models relying solely on QSR or MSR.
Occlusion and Stability Estimation: Experiments comparing baseline deep networks with the proposed architecture demonstrated improved accuracy, especially with smaller training datasets. The attention mechanism significantly reduced the sample complexity needed for effective training.
Axiom Learning and Merging: The architecture successfully learned previously unknown state constraints and integrated them into the reasoning process, resulting in more optimal and correct plans while reducing planning time.

Implications and Future Directions

The proposed architecture presents several practical and theoretical implications:

Efficient Learning: By intertwining commonsense reasoning with deep learning, the architecture facilitates efficient learning with fewer labeled examples. This is crucial for applications where large datasets are unfeasible or expensive to compile.
Enhanced Robustness: The integration of cognitive principles aids in overcoming inherent limitations of deep networks, such as the requirement for extensive labeled datasets and the opaqueness of decisions. The heuristic-driven axiom merging process further ensures that the knowledge base remains relevant and accurate over time.
Scalability: The architecture's reliance on relational knowledge and incremental learning supports scalability to more complex domains and tasks, while maintaining efficiency.

Future research directions include expanding the complexity of scenes, exploring further integration and understanding of deep network behavior through relational structures, and expanding the types of axioms learned to enhance the coverage and robustness of the knowledge base. Additionally, real-world applications in human-robot interaction and other assistive technologies remain a promising frontier for deploying and refining the proposed architecture.

This paper provides a solid foundation for the integration of structured commonsense reasoning and knowledge acquisition methodologies with deep learning, aimed at advancing the field of robotics in both practical performance and theoretical understanding.

PDF Markdown