Neuro-Symbolic Task & Motion Planning

Updated 21 September 2025

Neuro-Symbolic TAMP is a framework that integrates symbolic reasoning with continuous geometric planning for dynamic robotic tasks.
It employs object-centric optimization by reparameterizing planning variables into relative Cartesian frames to decouple actions and enhance robustness.
The approach uses reactive control and temporal decoupling to adapt in real time to sensing uncertainty and environmental changes.

Neuro-Symbolic Task and Motion Planning (TAMP) encompasses a class of frameworks and algorithms that tightly integrate symbolic AI concepts—logic, predicates, discrete actions, and high-level planning—with continuous geometric, kinematic, and dynamic reasoning necessary for robotic manipulation in the physical world. Such integration aims to synthesize high-level task policies with robust, feasible low-level motion plans that accommodate sensing, actuation, and environmental uncertainty. The following sections delineate the defining principles, algorithmic strategies, theoretical and practical advances, and the empirical outcomes that characterize neuro-symbolic TAMP, as exemplified by the object-centric, Cartesian frame-based approach described in "Object-Centric Task and Motion Planning in Dynamic Environments" (Migimatsu et al., 2019).

1. Object-Centric Optimization and Problem Formulation

A distinguishing feature of the referenced object-centric TAMP algorithm is the reparameterization of planning variables from global robot joint-space configurations to a sequence of Cartesian frames defined relative to target objects. At each step $t$ , the plan is parametrized by a relative pose $\xi_t \in \mathbb{R}^6$ , with $\xi_{tp} \in \mathbb{R}^3$ representing a position vector and $\xi_{tr} \in \mathbb{R}^3$ an axis–angle orientation. The “control frame” (typically, the robot’s end-effector) is expressed as a transformation relative to the target object, enabling the consistent specification of goals regardless of object movement.

The optimization problem takes the form: $\min_{\xi_{0:T}} \; h(\xi_{0:T}) + \sum_{t=1}^T g_t(\xi_{0:T})$ subject to: $\xi_0 = \xi_{\text{init}}, \quad f_{\text{path}_{k(t)}}(\xi_t) = 0 \ (t=1,\dots,T), \quad f_{\text{switch}_{k}}(\xi_{t_k}) = 0 \ (k=1, \dots, K)$ Here, $f_{\text{path}}$ and $f_{\text{switch}}$ encode the geometric and discrete transition constraints specific to each symbolic action (e.g., pick, place). Because the optimization operates on relative coordinates, action constraints become temporally decoupled: the symbolic "pick" and "place" constraints can be defined independently and are unaffected by prior geometric contingencies.

The objective $g_t(\xi)$ includes regularizers for smoothness in position and orientation: $g_t(\xi) = \alpha \| x_{ee}(\xi; t) - x_{ee}(\xi; t-1) \|_2^2 + \beta \| \log(\phi_{ee}^{-1}(\xi; t-1) \cdot \phi_{ee}(\xi; t)) \|_2^2$ where $x_{ee}(\xi; t)$ is the global end-effector position computed from recursively composed relative transforms, and $\phi_{ee}(\xi; t)$ represents end-effector orientation.

Recursive composition for a frame $i$ with parent $\lambda(i; t)$ : ${}^{\lambda(i)}x_i(\xi; t) = \begin{cases} \xi_{tp} & \text{if } i \text{ is control frame at } t \ {}^{\lambda(i)}x_i(\xi; t-1) & \text{otherwise} \end{cases}$

${}^{\lambda(i)}R_i(\xi; t) = \begin{cases} \exp(\xi_{tr}) & \text{if } i \text{ is control frame at } t \ {}^{\lambda(i)}R_i(\xi; t-1) & \text{otherwise} \end{cases}$

2. Real-Time Adaptation via Reactive Control

By expressing plans in object-relative terms, the system achieves inherent robustness to kinematic, sensing, and environmental perturbations: should a target object move, the defined relative goal remains correct by construction. Tracking and actuation are realized through reactive, operational space controllers, which at runtime create attractive fields between the control (end-effector) frame and the target object’s current frame, computed via real-time sensory input (e.g., RGB–D, fiducials).

The end-effector’s desired acceleration is given by: $\ddot{x}_{\text{goal}} = -k_p (x_{ee} - x_{\text{des}}) - k_v \dot{x}_{ee}$ with $x_{\text{des}}$ determined as the transform of the target object’s observed pose compounded with the planned relative transform $\exp(\xi_t)$ .

Repulsive fields are incorporated for collision avoidance. The high-frequency feedback loop architecture ensures that tracking errors due to misperceived object states or unexpected disturbances are immediately rectified at the control level, circumventing the need for global replanning.

3. Temporal Decoupling and Symbolic-Geometric Integration

The approach fundamentally enables a robust, temporally decoupled interface between symbolic task planning (e.g., STRIPS or PDDL-based planners outputting action sequences) and low-level geometric planners. High-level, discrete symbolic plans—such as pick(hook), push, place(box, shelf)—serve as constraints or “action skeletons”. The continuous, object-relative optimizer fills in geometric parameters at plan time. This separation ensures that symbolic errors, or physical disturbances during one action (e.g., imperfect pick), do not induce compounding infeasibility in subsequent steps (e.g., place).

The closed-loop architecture supports continual real-time plan adaptation and aligns with the neuro-symbolic paradigm, where symbols carry discrete causal structure and geometric planners resolve continuous feasibility under non-stationary conditions.

4. Experimental Demonstrations and Quantitative Results

The object-centric TAMP framework was validated in both simulation and hardware:

In a simulated Tower of Hanoi, the method planned and executed 14 sequential pick-and-place actions with decoupled geometric constraints, maintaining robustness against small object perturbations and achieving typical planning times of a few seconds (e.g., 4.02s with IPOPT for 14 actions).
In the Workspace Reach scenario, a Franka Panda robot on hardware and in simulation adapted to environments where key objects were out of direct reach, utilizing tool-mediated pushing actions (e.g., with a hook) and reliable place-grasping, enabled by the real-time control architecture.
Reactive controllers handled perception noise (with Kinect v2) and object motion, ensuring smooth execution even under vision and actuation uncertainty.

The architecture thereby demonstrated robust execution under dynamic, unmodeled changes, confirming its viability in real-world manipulation.

The object-centric, frame-based TAMP formulation directly addresses several historical limitations of hierarchical TAMP:

Mitigation of plan invalidation due to environment dynamics, thus obviating costly global replanning in the presence of moving objects or sensor artifacts.
Modular decoupling of symbolic action constraints; subsequent actions are independent of geometric realization of prior actions.
Seamless integration of symbolic-level discrete reasoning (e.g., via STRIPS) with robust continuous plan adaptation, reflecting central themes of neuro-symbolic AI: fusing general, discrete abstractions with data-driven, real-time continuous control.
Provides actionable substrate for future neuro-symbolic approaches that may incorporate learning-based perception or further formal logic integration.

6. Design Tradeoffs and Limitations

Despite robustness and real-time adaptation, the approach imposes computational demands at the trajectory optimization and control levels, particularly as the complexity or number of simultaneous actions/clusters increases. The core object-centric abstraction assumes the ability to accurately associate sensory data and geometric transforms in the robot’s kinematic tree, which could pose challenges in heavily cluttered or occluded settings. The method is also primarily designed for manipulation, with explicit transferability to highly dexterous or multi-agent tasks left for further exploration.

7. Conclusion

The neuro-symbolic object-centric TAMP algorithm based on optimization over Cartesian frames achieves robust integration of symbolic action sequencing with continuous, reactive control. By leveraging relative pose variables $\xi_t \in \mathbb{R}^6$ and operational space control, it provides temporal decoupling, closed-loop plan adaptation, and resilience against real-world uncertainty, substantiated by strong empirical performance in dynamic manipulation domains. This represents a notable step toward scalable, neuro-symbolic robot autonomy where logical reasoning is tightly coupled with geometric and dynamic execution (Migimatsu et al., 2019).

PDF Markdown Chat (Pro)

References (1)

Object-Centric Task and Motion Planning in Dynamic Environments (2019)

Follow Topic

Get notified by email when new papers are published related to Neuro-Symbolic Task and Motion Planning (TAMP).