DexForce: Force-Informed Imitation Learning

Updated 13 November 2025

DexForce is a methodology that integrates tactile feedback with kinesthetic demonstrations to generate force-informed position targets for dexterous manipulation tasks.
It employs a two-stage process that extracts force dynamics via six-axis sensors and replays reconstructed trajectories using an inverse impedance controller.
Quantitative evaluations show significant performance improvements over non-force approaches in tasks such as sliding cubes, opening AirPods, and unscrewing nuts.

DexForce is a methodology for generating force-informed action labels for imitation learning of dexterous, contact-rich manipulation tasks from kinesthetic demonstrations augmented with tactile sensing. It addresses the fundamental problem that existing demonstration techniques for dexterous robots—specifically retargeted teleoperation without direct haptic feedback—fail to capture the precise contact forces necessary for success in tasks that require high manipulation dexterity. By instrumenting robot fingertips with six-axis force-torque sensors and algorithmically extracting position targets that encode the required contact dynamics, DexForce enables the training of policies that can replicate human-level contact skills in fine-grained manipulation.

1. Motivation and Problem Setting

Dexterous manipulation tasks such as opening an AirPods case or unscrewing a nut critically depend on not only precise finger trajectory generation but also on modulating contact forces at the right time and location. Traditional means of generating demonstrations, such as teleoperation or motion retargeting from human hands, suffer from poor human-to-robot motion correspondence and the absence of haptic feedback, rendering it extremely challenging for demonstrators to impart the necessary force information.

Kinesthetic teaching—where a human guides the robot's fingers directly—affords direct haptic feedback and high correspondence but only records kinematic (fingertip pose) and force data, lacking the direct control actions required for robot-agnostic policy learning. DexForce closes this gap by leveraging fingertip force-torque data to reconstruct, via an "inverse impedance controller," the position targets that, when replayed, induce the measured contact forces. This approach creates a dataset whose supervision signal encodes both positional trajectories and the underlying force requirements.

2. Algorithmic Framework

DexForce operates in a two-stage process:

Stage 1: Extraction of Force-Informed Position Targets

Kinesthetic demonstrations are recorded at $T$ time steps, yielding sequences $\{x_{o,1},\ldots,x_{o,T}\}$ (robot fingertip positions) and $\{f_1,\ldots,f_T\}$ (measured 3-axis contact forces) per fingertip.
Cartesian impedance control is employed, governed by:

$F = K_p(x_d - x_c) - K_v\dot{x}_c$

$\tau = J^T F + g$

where $F$ is the force, $x_d$ desired position, $x_c$ current position, $K_p$ / $K_v$ stiffness/damping, $J$ the Jacobian, $g$ gravity compensation, and $\tau$ the joint torques.

To compute force-informed targets $x_f$ that yield the required force $f$ under quasi-static conditions, a linear mapping is assumed:

$x_f = x_o + K_f f$

where $K_f$ is a diagonal stiffness matrix, hand-tuned to ensure the replayed controller matches observed forces.

Stage 2: Replay and Demonstration Augmentation

The reconstructed $x_{f,1:T}$ trajectories are replayed using the impedance controller.
At each time step, new pose $(x_o^*)$ , force/torque $(f^*, m^*)$ , and a wrist RGB image $I$ are recorded, producing data tuples $(I_t, f^*_t, m^*_t, x_{f,t})$ for imitation learning.

3. Policy Learning and Architecture

The policy learning objective is to train a model $\pi_{\theta}$ that predicts future $x_f$ sequences from recent multi-modal sensory inputs:

Observations at each time step are $o_t = [\phi(I_t), f^*_t, m^*_t]$ , where $\phi$ is a ResNet18 encoder for the RGB image, $f^*,m^*\in\mathbb{R}^3$ are the measured forces/torques per fingertip.
The supervised objective minimizes the mean squared error:

$L(\theta) = \sum_{t=1}^{T} \| x_{f,t} - \hat{x}_{f,t} \|^2$

The policy is realized as a conditional diffusion model over sequences, structured as:
- Observation horizon $h=2$ steps (each with $128\!\times\!128$ or $256\!\times\!256$ RGB, 6-axis force and moment per fingertip)
- Prediction horizon $H_p=16$ , execution horizon $H_e=8$
- Output is a sequence of fingertip 3D position targets for direct impedance control execution.

4. Evaluation: Manipulation Tasks and Quantitative Results

DexForce was evaluated on six dexterous manipulation tasks, each requiring one or two robot fingers and exhibiting high contact richness:

Task	Policy Action (Force-informed)	Baseline (No force)
Slide cube	90%	0%
Reorient smiley	80%	0%
Open AirPods	57%	0%
Grasp battery	80%	0%
Unscrew nut	80%	0%
Flip box	95%	0%
Average	76%	~0%

These results demonstrate that absent explicit force-informed action targets, imitation learning policies almost never succeed in contact-rich dexterous tasks. Embedding demonstrated forces into policy targets is therefore a strict requirement for high success rates in this domain.

5. Sensory Modalities and Ablation Study

To assess the utility of tactile feedback during policy execution, three observation variants were evaluated:

Visual only (RGB)
Visual + binary contact (flags set if $|f| > 0.55\,\mathrm{N}$ )
Visual + full 6-axis force/torque (F/T)

The following findings were established:

Including full F/T data never degrades performance and leads to statistically significant improvements in tasks with high precision and coordination requirements:
- Open AirPods: +1600% success relative to RGB-only
- Unscrew nut: +136%
- Slide cube: +58.8%
Binary contact indicators are insufficient compared to full F/T data and may underperform pure visual policies.
In the absence of real-time force feedback, policies are prone to failure due to incorrect pushing or insufficient gripping.

6. Limitations and Prospects for Development

DexForce’s core limitations and identified future work include:

Throughput: Kinesthetic teaching is inherently lower throughput than teleoperation; automating or expediting force-informed data collection is a key challenge.
Dexterity scope: Current demonstrations use only a subset (one or two) of robot fingers. Hardware or user interface advances will be required for scalable full-hand demonstrations.
Hybrid methods: The potential of integrating teleoperation with haptic feedback systems offers a promising direction to combine high demonstration throughput with the force observability that underpins DexForce.

By algorithmically encoding force dynamics into demonstration actions, DexForce establishes a practical paradigm for teaching robots the nuanced force behaviors essential for complex, contact-rich manipulation in real-world settings.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to DexForce.