Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances (2108.04145v2)

Published 9 Aug 2021 in cs.RO and cs.AI

Abstract: We present a strategy for designing and building very general robot manipulation systems involving the integration of a general-purpose task-and-motion planner with engineered and learned perception modules that estimate properties and affordances of unknown objects. Such systems are closed-loop policies that map from RGB images, depth images, and robot joint encoder measurements to robot joint position commands. We show that following this strategy a task-and-motion planner can be used to plan intelligent behaviors even in the absence of a priori knowledge regarding the set of manipulable objects, their geometries, and their affordances. We explore several different ways of implementing such perceptual modules for segmentation, property detection, shape estimation, and grasp generation. We show how these modules are integrated within the PDDLStream task and motion planning framework. Finally, we demonstrate that this strategy can enable a single system to perform a wide variety of real-world multi-step manipulation tasks, generalizing over a broad class of objects, object arrangements, and goals, without any prior knowledge of the environment and without re-training.

Citations (56)

View on Semantic Scholar

Summary

The paper introduces a framework enabling robots to manipulate unknown objects in unfamiliar environments using perception-driven task and motion planning.
This approach leverages existing perception models to estimate object geometry and affordances dynamically from RGB-D data without prior object models.
Numerical results show the system successfully performs diverse tasks like arranging and picking/placing objects in cluttered, unknown setups.

Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances

The paper introduces a novel framework for robot manipulation systems tasked with handling unknown objects and environments, aiming to build general-purpose task-and-motion planners (TAMP) integrated with perception modules. These systems are formulated to operate directly from perceptual data streams acquired from RGB-D imaging, fostering manipulation without prior knowledge of object instances, their geometries, or affordances. This approach is ambitious, as it attempts to bridge the existing gap between model-free learning techniques and highly engineered robotic systems that rely on precise object models.

The primary innovation lies in utilizing a TAMP methodology that capitalizes on existing perception models to dynamically estimate geometry and affordances of previously unseen objects. This method diverges from traditional planning paradigms that rely heavily on predefined object models and leverages the PDDLStream framework to handle streams that map perceptually derived insights into planning constraints. By employing modules that perform segmentation, affordance estimation, and grasp synthesis, these systems adapt to varying tasks and environments effectively.

Numerical results from the paper highlight several successful implementations on diverse tasks, such as arranging objects in a cluttered environment or picking and placing tasks in unknown setups. For instance, the system demonstrated proficiency in achieving the goal where objects have no specific identifiers and vary in number and arrangement. This adaptability is indicative of the system's robustness and flexibility, expanding the scope of robotic applications in unstructured environments.

Theoretically, this work enriches the TAMP literature by integrating perception-driven affordance reasoning into high-level planning. Practically, it suggests a promising direction for robotic applications in dynamic and unfamiliar settings, such as domestic environments or manufacturing lines handling variable product configurations. The reliance on perceptual inputs mitigates the necessity for extensive object databases, reducing preparation time and resources while maintaining adaptability.

Future developments of these systems can explore integrating more complex manipulative abilities such as non-prehensile actions, advancing perceptual fidelity to foster state estimation and uncertainty handling, and enhancing autonomy through belief-space planning to address partial observability issues. Moreover, these enhanced TAMP systems could potentially incorporate multi-modal sensory data, including tactile feedback, to further refine decision-making processes in ambiguous or occluded scenarios.

In conclusion, the proposed framework represents a significant step forward in TAMP for robotics, enabling robust and generalized manipulation of unknown objects while broadening the operational reach of intelligent systems in complex and unpredictable environments.

Related Papers

YouTube

Show All Videos