- The paper introduces a unified framework leveraging Motion Statecharts, differentiable kinematic world models, and jerk-bounded lMPC to translate semantic instructions into smooth, constraint-adhering motion trajectories.
- It demonstrates robust real-time performance and scalability by successfully deploying the system on eight diverse robotic platforms with 100 Hz control rates.
- The work enables adaptive replanning and formal constraint satisfaction, advancing embodied cognition and multi-platform transfer in robotic manipulation.
Closing the Motion Execution Gap: An Expert Review
Introduction and Motivation
Robotic manipulation in real-world environments often faces a disconnect between high-level, semantically-rich symbolic task descriptions and the executable, constraint-satisfying motion trajectories required for physical task completion. This disconnectโthe "Motion Execution Gap"โis typified by instructions like "cut zucchini into small slices," where the semantics inherently encode both explicit (slice thickness) and implicit (safety, object handling) constraints that must be honored during execution. The paper "Closing the Motion Execution Gap: From Semantic Motion Task Constraints to Kinematic Control" (2605.12053) addresses this gap by introducing an executable symbolic framework that leverages Motion Statecharts (MSC), differentiable kinematic world models, and a jerk-bounded linear Model Predictive Control (lMPC) to bridge the translation from semantic task specification to constraint-adhering kinematic control.
Figure 1: PR2 must generate constraint-satisfying trajectories from semantic instruction ``cut zucchini into small slices''.
Differentiable Kinematic World Model
To achieve world-centric motion specification, the proposed framework models not only the robot kinematics but also those of the articulated environment. The unified differentiable kinematic model is parameterized by links, joints, and symbols referring to degrees of freedom (DoFs) and environmental variables, enabling the system to capture articulated environmental elements such as doors or handles. Transformations between links are represented as symbolic expressions utilizing CasADi for automatic differentiation, permitting the instantiation of task functions on any relevant chainsโeven those crossing robot/environment boundaries.
This model allows manipulators to interact with objects whose kinematics are not directly actuated but can be controlled indirectly (e.g., opening a fridge by grasping its handle and rotating its hinge). Tasks are defined as differentiable functions mapping system states to arbitrary task spaces, and these are further constrained via equalities or inequalities to form task functions. This formulation supports the expression of complex geometric constraints and their derivatives for motion planning and execution.
Motion Statecharts: Symbolic Task Composition and Monitoring
Motion Statecharts (MSC) extend classical FSM and BT representations to efficiently compose and monitor robot behaviors using dual-FSM nodes. Each node encapsulates both observation and life cycle FSMs, allowing real-time activation, monitoring, and semantic annotation of sub-tasks and their constraint satisfaction status. Nested MSC templates support structural reuse and compositionality, permitting both sequential and parallel arrangement of tasks and monitors.
Importantly, MSC observation states (True/False/Unknown) track the real-time satisfaction or failure of constraints and triggers for state transitions, yielding a closed-loop semantic feedback mechanism. The high-level planner receives direct semantic feedback regarding execution, enabling diagnosis of failures (e.g., contact loss during cutting) or triggering adaptive replanning (such as stopping or adjusting actions when human proximity is detected).
lMPC-Based Task Function Control: Jerk-Bounded Optimization
Traditional QP-based task function controllers risk discontinuities during task function switches. Prior mitigation strategiesโstep-wise phase transitions or weight scalingโeither restrict motion flexibility or introduce additional tuning complexity. The proposed framework instead instantiates the task-function control paradigm as a short-horizon lMPC with explicit jerk bounds, ensuring smooth and stable velocity profiles even amidst rapid task exchanges and parallel activities.
The lMPC controller is formulated as a QP minimizing both joint velocities and slack variables over a prediction horizon. The system model is encoded via a semi-implicit Euler integration scheme and includes bounds for velocity and jerk, facilitating enforcement of theoretical deceleration limits without extended manual tuning. Task functions are translated into single constraint/slack pairs per QP, maximizing scalability and real-time solvability (achieving 100 Hz control rates with qpSWIFT).
Experimental Validation and Transferability
The frameworkโrealized as Giskardโwas deployed across eight diverse robot platforms, ranging from mobile manipulators (PR2, TIAGo, HSR) to dual-arm industrial robots and five-finger setups. The system demonstrated robust transferability, requiring only minimal parameter adjustments (prediction horizon and feedback rate), and achieving out-of-distribution generalization with no robot-specific tuning for MSC composition.
Figure 2: The eight real robots the proposed framework has been deployed on.
Strong empirical results are evident in both task and motion generalization. For example, the dual UR10 setup executed peg-in-hole insertion motions composed of sequential and parallel constraints (feature functions, tilt, insertion, alignment), maintaining smooth trajectories (as shown in the measured joint-space profiles) even under frequent task switching and sensor noise.

Figure 3: The dual UR10 setup executing an insertion motion and the corresponding MSC at the time of execution.
The semantic annotation and monitoring process was further validated in manipulation scenarios like cutting, where semantic states (material separation, contact monitoring) were accurately tracked and leveraged by the high-level planner. Planners could reason about failures (e.g., knife not touching object), repeat actions, or adapt strategies in physical execution.
Discussion: Implications and Future Directions
The introduction of Motion Statecharts coupled with differentiable kinematic world models and jerk-bounded lMPC closes the Motion Execution Gap for semantic-to-kinematic translation. It enables scalable, real-time, constraint-adhering execution grounded in semantic representation, supporting dynamic plan adaptation and robust multi-platform transfer. Unlike generative AI-based approaches, which lack guarantees for constraint satisfaction and embodiment-agnostic transfer, this framework provides formal adherence and semantic traceability throughout execution.
The practical implications are substantial for cognitive robotics, industrial manipulation, and service robots operating in complex environments. The framework's robot-agnostic semantic motion composition and monitoring allow direct integration with knowledge-based planners and semantic world representations, paving the way for intelligent closed-loop perception-action architectures.
Theoretically, the work raises the potential for further integration with dynamic models, force control, and hybrid symbolic/neural planning algorithms. Future directions include augmenting the kinematic model with dynamics for compliant force-aware execution, integrating sampling-based planners and trajectory optimizers for global task-motion planning, and advancing real-time semantic feedback loops for adaptive, long-horizon manipulation.
Conclusion
This paper delineates an effective methodology for translating semantic motion constraints into smooth, transferable kinematic executions via Motion Statecharts, differentiable world models, and lMPC. Empirical deployment on eight robots confirms robust generalization and constraint adherence, with real-time semantic monitoring and adaptive feedback. The approach materially advances the state-of-the-art in symbolic-to-kinematic translation, providing a foundation for scalable, embodied cognition in AI-powered robotics.