Force-Aware Imitation Learning

Updated 18 November 2025

Force-aware imitation learning is a technique that integrates explicit force signals from sensors or estimations into robotic control policies for compliant and adaptive interactions.
It leverages rich data sources—including force/torque and tactile sensing, as well as sensorless estimations—to generate target actions that overcome the limits of vision-only approaches.
Empirical studies show that these methods significantly improve success rates and safety in manipulation tasks, ensuring robust real-world performance in various domains.

Force-aware imitation learning is an advanced paradigm in robot policy learning in which explicit force signals—measured, estimated, or inferred—are incorporated into both the training data and the learned policy to address the fundamental limitations of position-centric and vision-only approaches in contact-rich manipulation tasks. By conditioning policy learning directly on force feedback and generating force-informed target actions and compliance parameters, these methods achieve robust, generalizable, and safe behaviors in domains ranging from industrial assembly and household manipulation to surgical robotics and dexterous in-hand control. Below, key principles, methodologies, and empirical findings tracing to recent arXiv research are systematically presented.

1. Motivation and Scope of Force-Aware Imitation Learning

Standard visuomotor imitation learning tracks desired positions but typically ignores compliance and force, resulting in excessive contact forces, fragile behavior under uncertainty, and poor performance in tasks requiring stable contact or adaptive interaction with the environment. Robust manipulation of rigid, deformable, or fragile objects and cooperative or multi-agent scenarios necessitates policies that can reason about force signals and adapt their control actions accordingly—either by predicting target wrenches directly or modulating motion compliance (Li et al., 3 Oct 2025, Yu et al., 28 May 2025, Lee et al., 23 Sep 2025, Chen et al., 17 Jan 2025, Ge et al., 21 Sep 2025).

The scope of force-aware imitation learning includes:

Conditioning learned policies on force/torque sensor signals, reaction torque estimates, or tactile deformation vectors
Synthesizing target actions (e.g., trajectories, impedance/compliance gains, grip widths, normal forces) that reproduce demonstrated force profiles
Integrating force signals as privileged modalities in multimodal architectures
Enabling zero-shot sim-to-real transfer of compliant contact-aware policies

2. Key Data Collection Methods and Force Sensing Paradigms

Force-aware imitation learning relies on rich data modalities that faithfully capture both motion and interaction forces. Three classes of data collection pipelines predominate:

A. Bilateral Teleoperation with Reaction/Disturbance Observers

4-channel bilateral control frameworks separately record both acting and reaction forces via master-slave manipulator pairs (Adachi et al., 2018, Sasagawa et al., 2019, Yamane et al., 8 Jul 2025, Kobayashi et al., 2024, Kobayashi et al., 2 Apr 2025). Joint angles, velocities, and torques are measured at high rate (typically 1 kHz), with reaction-torque signals estimated by disturbance observers (DOB) and reaction force observers (RFOB) in hardware or simulation. This architecture ensures clean separation of human-intended commands and robot/environment responses.

B. Direct Force/Torque and Tactile Sensing

Demonstrations may be collected using robot hands instrumented with F/T sensors at the fingertips or at the wrist (Chen et al., 17 Jan 2025, Ablett et al., 2023, Helmut et al., 15 Oct 2025). Visual tactile sensors (e.g., GelSight, see-through STS) enable estimation of high-dimensional force distributions via convolutional encoders pretrained on finite-element synthetic data (Helmut et al., 15 Oct 2025, Ablett et al., 2023).

C. Sensorless Estimation and Simulation-Based Inference

When dedicated F/T sensors are unavailable, end-effector wrenches can be estimated via analytical Jacobian inversion of joint torques, matched against model-predicted torque from a digital twin simulator (e.g., MuJoCo) (Ge et al., 21 Sep 2025). Additionally, force signals may be indirectly inferred from deformation patterns, kinesthetic interfaces, or simulation-based effect matching (Ehsani et al., 2020, Wang et al., 2021, You et al., 24 Jan 2025).

These diverse sources can yield normalized force data streams for policy learning, often aligned and downsampled to match image or proprioceptive signal rates (e.g., 25–100 Hz).

3. Policy Learning Architectures and Mathematical Formulation

A. Conditional Vector Field and Flow Matching

In flow-aware frameworks such as "Flow with the Force Field," compliant policies are trained to predict time-indexed vector fields $v_\theta(z_t, t)$ that transport a base distribution (Gaussian noise) to the empirical distribution of actions, using a rectified flow matching loss:

$\min_\theta \mathbb{E}_{z_0 \sim p_0, z_1 \sim p_1} \left[ \int_0^1 \| (z_1 - z_0) - v_\theta(z_t, t) \|_2^2 dt \right]$

Actions incorporate reference trajectories, virtual contact targets, and impedance gains, with compliance modulated via learned force schedules (Li et al., 3 Oct 2025).

B. Transformer-Based Chunked and Multimodal Policy Networks

Bilateral control-based frameworks (e.g., Bi-ACT, Bi-LAT, ForceVLA) leverage transformers with action chunking, conditioning on joint positions, velocities, torques, vision, tactile, and sometimes language cues, using cross-modal attention and fusion modules (Kobayashi et al., 2 Apr 2025, Kobayashi et al., 2024, Yu et al., 28 May 2025). Conditional variational autoencoders (CVAEs) or diffusion models parameterize joint distributions over multimodal actions.

C. Diffusion and Score-Based Generative Policies

Recent work adopts denoising diffusion or unified multimodal diffusion forcing. Policies are trained to reconstruct trajectories from partially masked modalities, capturing the dependency between motions, forces, and rewards (Huang et al., 6 Nov 2025, Ablett et al., 2023, Chen et al., 17 Jan 2025, Basak et al., 2024). Tactile-conditioned diffusion additionally predicts target force profiles from high-dimensional visual tactile data (Helmut et al., 15 Oct 2025).

D. Hybrid Trajectory and Reinforcement Learning

For complex assemblies and variable contact dynamics, hybrid frameworks combine hierarchical imitation learning for geometric trajectory synthesis with deep RL for adaptive force/impedance parameter selection (Wang et al., 2021, You et al., 24 Jan 2025). Controllers blend PD position and PI force regulation, with selection matrices that allocate control axes.

4. Control Primitives and Compliance Rollouts

Force-aware policies output either explicit force/torque commands or target positions/grip widths/impedance gains, which are rolled out via closed-loop compliant controllers. Passive impedance or admittance control is canonical:

$F_c = -D(x)\left[\dot{x} - f(x)\right]$

where compliance is shaped by blending nominal and contact-normal directions using learned gains directly supervised by the force magnitude (Li et al., 3 Oct 2025, Ge et al., 21 Sep 2025). Hybrid position-force primitives may adaptively select between pure kinematic and force-modulated motion depending on predicted contact (Liu et al., 2024, Wang et al., 2021).

5. Experimental Validation and Quantitative Impact

Empirical results demonstrate substantial improvements in success rate, compliance, and safety over position-only baselines:

Framework	Task Domain	Success (Force-aware)	Success (Vision/Position)	Key Impact
Flow with Force Field	Block flipping (sim2real)	97.6% (sim), 89% (real)	near-zero (force/compliance ablated)	Contact generalization, energy reduction (Li et al., 3 Oct 2025)
ForceVLA	Plug insertion, cucumber peel	up to 90%	37–40%	+23.2% average, robust under occlusion (Yu et al., 28 May 2025)
ManipForce (FMT)	Box/gear assembly, flipping	83%	22%	+61%, critical for transient events (Lee et al., 23 Sep 2025)
DexForce	6 dexterous tasks	76% (mean)	near zero	OOD generalization, force ablations (Chen et al., 17 Jan 2025)
FILIC	Peg-in-hole, socket assembly	80–90%	46–68%	+22–33 pp, smoother force profile (Ge et al., 21 Sep 2025)
Bi-ACT + ALPHA-α	Pick-and-place, bimanual ops	100% (ball), 80% (egg)	As low as 40%	Adaptation to object hardness (Kobayashi et al., 2024)
Bi-LAT (SigLIP)	Cup stacking, sponge twist	100% (cup), 80% (strong twist)	100% (no force control)	Language-driven force modulation (Kobayashi et al., 2 Apr 2025)
ForceMimic (HybridIL)	Zucchini peeling	85% peel-length	55%	+54.5% gain, force tracking (Liu et al., 2024)
FARM (Tactile)	Plant, grape, screw tasks	95–100%	as low as 0% (vision-only)	WM1 force error < 1N (Helmut et al., 15 Oct 2025)
Force-Aware Surgery	Tissue retraction (dVRK)	76–70%	26–20%	62–110% reduction in force (Abdelaal et al., 20 Jan 2025)

Numerous ablation studies confirm that inclusion of force in both observations and targets dramatically enhances policy reliability, compliance, and object/contact adaptation, especially in OOD generalization, variable stiffness, and uncertain contact scenarios. Weak or binary contact signals fail to encode required force magnitude/direction (Chen et al., 17 Jan 2025, Ablett et al., 2023).

6. Limitations, Challenges, and Future Directions

Principal limitations include dependency on high-quality force sensing or estimation, hardware and computational cost, and constraints on policy transfer to different geometries or objects. Simulation-reality gaps persist when force distributions are difficult to replicate (Li et al., 3 Oct 2025, Wang et al., 2021, You et al., 24 Jan 2025). Causal confusion may arise when proprioception is included alongside force in the observation vector (Chen et al., 17 Jan 2025).

Active areas of future research:

Multimodal architectures integrating vision, force, tactile, proprioception, and language in hierarchical or unified diffusion models (Huang et al., 6 Nov 2025, Yu et al., 28 May 2025, Kobayashi et al., 2 Apr 2025)
Online adaptation, multi-finger or bimanual extension, and compositional closed-loop primitives
Direct learning of compliance and impedance gains, and force prediction in partially unknown environments (Kobayashi et al., 2024, Ge et al., 21 Sep 2025, Helmut et al., 15 Oct 2025)
Improved kinesthetic and force demonstration interfaces, sim-to-real pipelines with robust normalization, and force-matching calibration techniques (Liu et al., 2024, Ablett et al., 2023)

The consensus in contemporary research is that force-aware imitation learning is essential for manipulation in contact-rich, uncertain, and safety-critical domains, and is rapidly becoming a foundational paradigm in physically intelligent robotics.