Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 129 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Neuro-Symbolic Task & Motion Planning

Updated 21 September 2025
  • Neuro-Symbolic TAMP is a framework that integrates symbolic reasoning with continuous geometric planning for dynamic robotic tasks.
  • It employs object-centric optimization by reparameterizing planning variables into relative Cartesian frames to decouple actions and enhance robustness.
  • The approach uses reactive control and temporal decoupling to adapt in real time to sensing uncertainty and environmental changes.

Neuro-Symbolic Task and Motion Planning (TAMP) encompasses a class of frameworks and algorithms that tightly integrate symbolic AI concepts—logic, predicates, discrete actions, and high-level planning—with continuous geometric, kinematic, and dynamic reasoning necessary for robotic manipulation in the physical world. Such integration aims to synthesize high-level task policies with robust, feasible low-level motion plans that accommodate sensing, actuation, and environmental uncertainty. The following sections delineate the defining principles, algorithmic strategies, theoretical and practical advances, and the empirical outcomes that characterize neuro-symbolic TAMP, as exemplified by the object-centric, Cartesian frame-based approach described in "Object-Centric Task and Motion Planning in Dynamic Environments" (Migimatsu et al., 2019).

1. Object-Centric Optimization and Problem Formulation

A distinguishing feature of the referenced object-centric TAMP algorithm is the reparameterization of planning variables from global robot joint-space configurations to a sequence of Cartesian frames defined relative to target objects. At each step tt, the plan is parametrized by a relative pose ξtR6\xi_t \in \mathbb{R}^6, with ξtpR3\xi_{tp} \in \mathbb{R}^3 representing a position vector and ξtrR3\xi_{tr} \in \mathbb{R}^3 an axis–angle orientation. The “control frame” (typically, the robot’s end-effector) is expressed as a transformation relative to the target object, enabling the consistent specification of goals regardless of object movement.

The optimization problem takes the form: minξ0:T  h(ξ0:T)+t=1Tgt(ξ0:T)\min_{\xi_{0:T}} \; h(\xi_{0:T}) + \sum_{t=1}^T g_t(\xi_{0:T}) subject to: ξ0=ξinit,fpathk(t)(ξt)=0 (t=1,,T),fswitchk(ξtk)=0 (k=1,,K)\xi_0 = \xi_{\text{init}}, \quad f_{\text{path}_{k(t)}}(\xi_t) = 0 \ (t=1,\dots,T), \quad f_{\text{switch}_{k}}(\xi_{t_k}) = 0 \ (k=1, \dots, K) Here, fpathf_{\text{path}} and fswitchf_{\text{switch}} encode the geometric and discrete transition constraints specific to each symbolic action (e.g., pick, place). Because the optimization operates on relative coordinates, action constraints become temporally decoupled: the symbolic "pick" and "place" constraints can be defined independently and are unaffected by prior geometric contingencies.

The objective gt(ξ)g_t(\xi) includes regularizers for smoothness in position and orientation: gt(ξ)=αxee(ξ;t)xee(ξ;t1)22+βlog(ϕee1(ξ;t1)ϕee(ξ;t))22g_t(\xi) = \alpha \| x_{ee}(\xi; t) - x_{ee}(\xi; t-1) \|_2^2 + \beta \| \log(\phi_{ee}^{-1}(\xi; t-1) \cdot \phi_{ee}(\xi; t)) \|_2^2 where xee(ξ;t)x_{ee}(\xi; t) is the global end-effector position computed from recursively composed relative transforms, and ϕee(ξ;t)\phi_{ee}(\xi; t) represents end-effector orientation.

Recursive composition for a frame ii with parent λ(i;t)\lambda(i; t): λ(i)xi(ξ;t)={ξtpif i is control frame at t λ(i)xi(ξ;t1)otherwise{}^{\lambda(i)}x_i(\xi; t) = \begin{cases} \xi_{tp} & \text{if } i \text{ is control frame at } t \ {}^{\lambda(i)}x_i(\xi; t-1) & \text{otherwise} \end{cases}

λ(i)Ri(ξ;t)={exp(ξtr)if i is control frame at t λ(i)Ri(ξ;t1)otherwise{}^{\lambda(i)}R_i(\xi; t) = \begin{cases} \exp(\xi_{tr}) & \text{if } i \text{ is control frame at } t \ {}^{\lambda(i)}R_i(\xi; t-1) & \text{otherwise} \end{cases}

2. Real-Time Adaptation via Reactive Control

By expressing plans in object-relative terms, the system achieves inherent robustness to kinematic, sensing, and environmental perturbations: should a target object move, the defined relative goal remains correct by construction. Tracking and actuation are realized through reactive, operational space controllers, which at runtime create attractive fields between the control (end-effector) frame and the target object’s current frame, computed via real-time sensory input (e.g., RGB–D, fiducials).

The end-effector’s desired acceleration is given by: x¨goal=kp(xeexdes)kvx˙ee\ddot{x}_{\text{goal}} = -k_p (x_{ee} - x_{\text{des}}) - k_v \dot{x}_{ee} with xdesx_{\text{des}} determined as the transform of the target object’s observed pose compounded with the planned relative transform exp(ξt)\exp(\xi_t).

Repulsive fields are incorporated for collision avoidance. The high-frequency feedback loop architecture ensures that tracking errors due to misperceived object states or unexpected disturbances are immediately rectified at the control level, circumventing the need for global replanning.

3. Temporal Decoupling and Symbolic-Geometric Integration

The approach fundamentally enables a robust, temporally decoupled interface between symbolic task planning (e.g., STRIPS or PDDL-based planners outputting action sequences) and low-level geometric planners. High-level, discrete symbolic plans—such as pick(hook), push, place(box, shelf)—serve as constraints or “action skeletons”. The continuous, object-relative optimizer fills in geometric parameters at plan time. This separation ensures that symbolic errors, or physical disturbances during one action (e.g., imperfect pick), do not induce compounding infeasibility in subsequent steps (e.g., place).

The closed-loop architecture supports continual real-time plan adaptation and aligns with the neuro-symbolic paradigm, where symbols carry discrete causal structure and geometric planners resolve continuous feasibility under non-stationary conditions.

4. Experimental Demonstrations and Quantitative Results

The object-centric TAMP framework was validated in both simulation and hardware:

  • In a simulated Tower of Hanoi, the method planned and executed 14 sequential pick-and-place actions with decoupled geometric constraints, maintaining robustness against small object perturbations and achieving typical planning times of a few seconds (e.g., 4.02s with IPOPT for 14 actions).
  • In the Workspace Reach scenario, a Franka Panda robot on hardware and in simulation adapted to environments where key objects were out of direct reach, utilizing tool-mediated pushing actions (e.g., with a hook) and reliable place-grasping, enabled by the real-time control architecture.
  • Reactive controllers handled perception noise (with Kinect v2) and object motion, ensuring smooth execution even under vision and actuation uncertainty.

The architecture thereby demonstrated robust execution under dynamic, unmodeled changes, confirming its viability in real-world manipulation.

The object-centric, frame-based TAMP formulation directly addresses several historical limitations of hierarchical TAMP:

  • Mitigation of plan invalidation due to environment dynamics, thus obviating costly global replanning in the presence of moving objects or sensor artifacts.
  • Modular decoupling of symbolic action constraints; subsequent actions are independent of geometric realization of prior actions.
  • Seamless integration of symbolic-level discrete reasoning (e.g., via STRIPS) with robust continuous plan adaptation, reflecting central themes of neuro-symbolic AI: fusing general, discrete abstractions with data-driven, real-time continuous control.
  • Provides actionable substrate for future neuro-symbolic approaches that may incorporate learning-based perception or further formal logic integration.

6. Design Tradeoffs and Limitations

Despite robustness and real-time adaptation, the approach imposes computational demands at the trajectory optimization and control levels, particularly as the complexity or number of simultaneous actions/clusters increases. The core object-centric abstraction assumes the ability to accurately associate sensory data and geometric transforms in the robot’s kinematic tree, which could pose challenges in heavily cluttered or occluded settings. The method is also primarily designed for manipulation, with explicit transferability to highly dexterous or multi-agent tasks left for further exploration.

7. Conclusion

The neuro-symbolic object-centric TAMP algorithm based on optimization over Cartesian frames achieves robust integration of symbolic action sequencing with continuous, reactive control. By leveraging relative pose variables ξtR6\xi_t \in \mathbb{R}^6 and operational space control, it provides temporal decoupling, closed-loop plan adaptation, and resilience against real-world uncertainty, substantiated by strong empirical performance in dynamic manipulation domains. This represents a notable step toward scalable, neuro-symbolic robot autonomy where logical reasoning is tightly coupled with geometric and dynamic execution (Migimatsu et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Neuro-Symbolic Task and Motion Planning (TAMP).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube