Unified Force & Position Control Policy
- Unified force and position control is a framework that concurrently manages a robot's end-effector motion and interaction force without explicit mode switching.
- Modern approaches employ model-based, QP-based, and learning-driven methods to seamlessly blend motion-centric and force-centric objectives in uncertain environments.
- Practical implementations demonstrate improved tracking, safety, and efficiency through adaptive compliance, Lyapunov stability, and passivity-based designs in contact-rich tasks.
A unified force and position control policy refers to a control architecture that enables a robot manipulator to concurrently or adaptively regulate both end-effector position and interaction force, often in the presence of modeling uncertainties, unstructured contacts, or varying environment dynamics. Such policies allow seamless and dynamically consistent transitions between motion-centric and force-centric objectives, obviating the need for explicit mode switching or corridor discretization of task space. Modern unified approaches encompass model-based control, adaptive observer-based schemes, learning-based methods, and end-to-end architectures employing deep or diffusion models. The following sections organize the technical landscape of unified force and position control based strictly on recent and foundational literature.
1. Fundamental Control Architectures
Unified force and position control structures provide simultaneous or adaptive command of motion and interaction force at the robot’s end-effector. Classical architectures, such as hybrid position/force control, decompose the task-space into orthogonal subspaces—allocating explicit tracking to one set and force regulation to another via selection/projection matrices (Xie et al., 2020, Cheng et al., 2023, Conkey et al., 2018). For example, given the task-space vector , the hybrid approach designs independent controllers for the motion () and force () subspaces. The output torques are then recombined ensuring stability and invariance under a smooth, invertible joint-to-task mapping (Xie et al., 2020).
Recent architectures extend these principles to dynamic environments or flexible manipulation by embedding the force/position blending into the desired trajectory dynamics, task frame selection, or via admittance/impedance laws parameterized in real time (Hou et al., 2024, Shao et al., 20 Oct 2025). The adaptive compliance policy (ACP), for instance, leverages a dynamic stiffness matrix , scheduling low-stiffness in the contact-normal direction and high-stiffness in orthogonal axes to balance tracking and force mitigation (Hou et al., 2024).
2. Mathematical Formulation of Unified Laws
Unified policies typically employ one of several mathematical schemes:
- Spring–Mass–Damper/Admittance Controllers: Continuous-time compliance is achieved with
where is dynamically parameterized for spatially and temporally-varying compliance (Hou et al., 2024).
- Task-Space Superposition: Individual reference accelerations from position and force subspaces are “stacked”:
and mapped back to joint torques with
- Unified QP-based Policies: Two-level QP architectures enforce nominal tracking while guaranteeing constraint satisfaction:
- Task-space QP to find close to the nominal under safety constraints.
- Joint-space QP to minimize input deviation subject to dynamical, velocity, and energy constraints (Sharifi et al., 2024).
Geometric and Port-Hamiltonian Formulations: On , error is defined via group logarithm, and control wrenches are derived from energy-tank-augmented impedance and force laws:
with pose error , enforcing passivity through virtual energy tanks (Seo et al., 23 Apr 2025, Shao et al., 20 Oct 2025).
3. Learning-Based and Data-Driven Policies
Recent advances exploit demonstration-based or end-to-end learning. Diffusion models, transformers, and RL policies are leveraged to unify force and position objectives:
- Diffusion-Guided Adaptive Compliance Policy (ACP): Learns to output the reference pose , virtual target , and stiffness , constructing adaptively based on sensed force histories and visual context; policy is learned with denoising score-matching over demonstration episodes (Hou et al., 2024).
- Transformer-Based IL+RL with Residual Force Loop: The START architecture fuses multi-modal state and sub-task tokens, issuing nominal pose and gripper commands. An RL-trained residual policy corrects finesse in force-control via adaptive admittance parameters (Ali et al., 5 Nov 2025).
- RL for Loco-Manipulation: History-dependent policies with learned force estimation modules enable force/position behaviors on legged robots without explicit force sensors, with observed ~39.5% improvement in imitation learning tasks involving contact-rich interaction (Zhi et al., 27 May 2025).
4. Robustness, Passivity, and Theoretical Guarantees
Analytical guarantees focus on stability, robustness to disturbances, and passivity:
- Stability via Lyapunov/Passivity: Lyapunov or storage functions structured on task or port-Hamiltonian states are employed. For example, total energy involving pose, velocity, and tank states is proven to be non-increasing except for power flow from external input:
ensuring system passivity (Shao et al., 20 Oct 2025, Seo et al., 23 Apr 2025, Cos et al., 2023).
- Second-Order Disturbance Observer (DOb): In SEA-powered manipulators, second-order DObs transform the system to virtual integrator form, allowing matched/mismatched disturbances to be robustly compensated and unified tracking laws for force or position to be applied (Sariyildiz, 2022).
- Barrier Functions for Safety: Control barrier functions (CBFs) are integrated into two-level QPs to uniformly enforce joint constraints, velocity, and force bounds even under model uncertainty and disturbance (Sharifi et al., 2024).
- Theoretical Minimum Compliance Principle: For manipulation, setting a single zero-stiffness axis aligned to contact force is shown to always admit a feasible velocity command under convex contact constraints, minimizing internal force build-up and guaranteeing constraint feasibility (Hou et al., 2024).
5. Task-Structure, Adaptation, and Temporal Context
Unified frameworks often exploit task decomposition or context modeling.
- Sub-task-Aware Transformers: Embedding sub-task IDs into transformer policies aligns control strategies to long-horizon temporal structure, enabling context-aware blending of position- and force- dominated regimes (e.g., folding versus creasing paper in robotic wrapping) (Ali et al., 5 Nov 2025).
- Constraint Frame Learning from Demonstration: Aligning a time-varying constraint frame to the desired force direction (rather than relying on a fixed hybrid projection) allows activation and deactivation of force control along dynamic axes, favoring task adaptability (Conkey et al., 2018).
- Implicit Mode Blending: Many architectures forego explicit mode switches; compliance-adaptive policies, integral transpose-based IKs, or online selection matrices achieve smooth and context-aware blending between tracking and interaction objectives at all times (Hou et al., 2024, Cos et al., 2023).
6. Practical Performance and Experimental Results
Unified policies yield improved manipulation, safety, and tracking in complex scenarios:
- ACP demonstrates >50% performance improvement over prior visuomotor baselines in contact-rich manipulation (Hou et al., 2024).
- Learning-based policies with force estimation outperform position-only models with a 39.5% increase in contact-rich task success (Zhi et al., 27 May 2025).
- Door opening with RL-based unified control yields 3.27× lower peak forces and 1.82× higher smoothness relative to classic adaptive controllers, while maintaining success rates across diverse mechanical doors (Kang et al., 2023).
- In paper wrapping, transformer-based unified policy achieves 97% task success and holds force tracking within ±0.5 N of setpoints (Ali et al., 5 Nov 2025).
- Port-Hamiltonian and geometric unified controllers maintain passivity, precise trajectory, and accurate force control under external perturbations and in the presence of ambiguous role switching in human-robot interaction (Shao et al., 20 Oct 2025, Seo et al., 23 Apr 2025).
| Approach | Key Mechanism | Representative Reference |
|---|---|---|
| Adaptive Compliance Policy | Diffusion+admittance control | (Hou et al., 2024) |
| RL/Transformer Residual | Hybrid IL+RL, sub-task IDs | (Ali et al., 5 Nov 2025, Zhi et al., 27 May 2025) |
| Geometric SE(3) Unified | SE(3)-consistent energy tank | (Seo et al., 23 Apr 2025) |
| Barrier/QP-based | Two-level CBF QPs | (Sharifi et al., 2024) |
| Hybrid Selection Matrices | Time-varying task constraints | (Conkey et al., 2018, Xie et al., 2020) |
7. Implementation and Design Guidelines
- Stiffness scheduling or adaptive compliance must trade off between tracking fidelity and force safety; aligning low stiffness with contact forces and high stiffness in orthogonal directions is theoretically justified (Hou et al., 2024).
- For learning-based policies, fused sensor streams (vision, proprioception, force history), transformer fusion, and context tokens enhance generalization (Ali et al., 5 Nov 2025).
- Safety and robustness require attention to passivity, physical feasibility of control inputs, and proper handling of phase transitions or contacts; energy tanks, projection operators, or observer-based compensation are effective frames (Seo et al., 23 Apr 2025, Shao et al., 20 Oct 2025, Cos et al., 2023).
- Practical deployments should run high-frequency admittance or controller loops (≥1 kHz when possible), conservatively bound feedback gains to avoid wind-up, and embed constraint enforcement at both kinematic and torque levels (Sharifi et al., 2024, Rouxel et al., 2023).
Unified force and position control policies have matured into a spectrum of control-theoretic and learning-theoretic methods, underpinned by rigorous stability/passivity arguments and enabled by advances in sensing, computation, and demonstration learning. These frameworks allow robots to arbitrate between dexterous manipulation, safe interaction, and complex temporal reasoning across dynamic, multimodal tasks in real-world environments (Hou et al., 2024, Seo et al., 23 Apr 2025, Ali et al., 5 Nov 2025).