Direct Force Learning in Robotics

Updated 19 January 2026

Direct force learning is a force-centric approach that models physical force data directly to enable more adaptive and safe control in contact-rich tasks.
It leverages explicit force sensing, tailored loss functions, and varied architectures such as actor–critic methods, imitation learning, and regression models.
Empirical results show accelerated training, improved robot-agnostic transfer, and higher task success rates compared to traditional position-only methods.

Direct force learning refers to a class of methodologies in robotics, machine learning, and computational neuroscience that focus on acquiring control, prediction, or generative models directly in the space of physical forces—often bypassing, modulating, or learning force-specific policies, mappings, or error signals, as distinct from position- or torque- or energy-centric formulations. Across domains, direct force learning typically involves explicit modeling of contact dynamics, the use of force/torque sensing, or the direct regression or control of force vectors, magnitudes, or distributions, including both supervised and reinforcement learning paradigms. Core motivations are improved sample efficiency in contact-rich domains, enhanced task adaptivity, transferability across hardware, and access to behaviors or predictions that reflect underlying physical regularities better than state-only approaches.

1. Foundations and Definitions

Direct force learning encompasses any architecture, algorithm, or control system in which force—rather than just position, velocity, or higher-level symbolic state—is directly represented in the learned policy, function approximator, or control law. This can manifest as:

Policies with force as the action or observation space (e.g., FLEX's force-based MDP (Fang et al., 17 Mar 2025)).
Supervised learning where the target variable is force (e.g., atomic force fields (Smith et al., 2020), vision-to-force mapping (Hanai et al., 2023)).
Imitation learning from demonstrations with explicit force, not just pose or image trajectories (e.g., DexForce (Chen et al., 17 Jan 2025), multi-modal LfD (Le et al., 2021)).
Spiking or recurrent neural networks trained to emulate target force trajectories or dynamics (e.g., FORCE training (Nicola et al., 2016)).
Joint force/position hybrid controllers or constraint learning that leverages recorded or predicted forces (e.g., dynamic constraint frames (Conkey et al., 2018)).

Common to these is the introduction or exploitation of a force-centric representation in the learning architecture or objective, with explicit loss terms, control variables, or network heads designated for force.

2. Problem Formulations and Mathematical Frameworks

Force-centric MDPs and objectives are a foundational ingredient. Notable instantiations include:

FLEX force-space MDPs: For articulated object manipulation under sustained contact, states are parameterized as $s_t^p = [h_p,\, \Delta p_t]$ , $s_t^r = [h_r, v_t]$ , and actions $a_t = F_t \in \mathbb{R}^3$ are direct 3D forces, explicitly bounded in norm. MuJoCo's contact engine encodes dynamics, while reward aligns with physically meaningful displacement $R(s_t, a_t, s_{t+1})$ incorporating the cosine between applied force and resultant motion (Fang et al., 17 Mar 2025).
Sim-to-real RL with force states: RL state spaces are augmented to include $\mathbf{f}_t\in\mathbb{R}^6$ force/torque, and rewards penalize force limit violations (e.g., $r_{ft}(s) = c_{ft}\,I_{ft}(s)$ , where $I_{ft}(s) = -1$ if $\|\mathbf{f}_t\|\geq\delta_{ft}$ ) (Lin et al., 30 Oct 2025).
Supervised/Inverse Modelling: Losses commonly combine force and energy/position, as in

$L(\theta) = c_1 L_\text{energy} + c_2 L_\text{force} + L_\text{reg}$

where

$L_\text{force} = \frac{1}{2} \langle \sum_{i=1}^{N_n} \| -\partial \hat{E}_n/\partial r_i(\theta) - F_{ni} \|^2 \rangle$

for quantum force fields (Smith et al., 2020).

Across domains, policies or function approximators are parameterized to predict forces directly, or use forces as privileged input, or modulate compliance and contact through force-based heads and action mappings.

3. Algorithms and Network Architectures

A wide variety of architectures are employed in direct force learning, reflecting different problem domains:

Actor–Critic Methods in Force Space: For contact-rich manipulation, two-headed MLP actors output force direction $\pi_{\rm dir}(s_t)$ normalized to unit length, and scale $\pi_{\rm scale}(s_t)\in[0,1]$ , with action $a_t = \pi_{\rm dir}(s_t)[\pi_{\rm scale}(s_t)\cdot\eta]$ . Critic networks estimate Q-values over (state, force) tuples. TD3 is standard, with dense rewards encouraging effective force application (Fang et al., 17 Mar 2025).
Imitation Learning with Force Features: In assembly and manipulation, demonstrations $(x_t, F_t)$ are processed into state-action pairs incorporating normal and friction force, position error, velocity, and geometric parameters; GAIL losses are augmented by standard behavior cloning and trajectory-imitation terms. Policies are shallow MLPs or transformer-based (in FTF (Adeniji et al., 2 Jun 2025)) with deterministic or stochastic output heads for force commands.
Direct Supervised Force Regression: For "learning potentials to force data" (Smith et al., 2020) or "force map" prediction from vision (Hanai et al., 2023), ResNet or ViT encoders feed into MLP or decoder towers, culminating in regression heads for vector or scalar force fields. Accurate differentiation with respect to positions is handled via either iterated backprop or directional derivative tricks for computational efficiency.
Spiking Neural Network FORCE Training: In FORCE training, target force or dynamical trajectories are imposed as outputs, and readout decoders are trained online via the recursive least squares rule. The core dynamic ensures that a fixed "chaotic" recurrent matrix is stabilized by a learned low-rank feedback that sculpts the reservoir dynamics to match any desired target trajectory (Nicola et al., 2016).
Hybrid and Impedance Approaches: Many frameworks use impedance control or hybrid force-position control, learning force- or stiffness-parameterized impedance models from demonstration or RL (e.g., learning time-varying stiffness matrices per task phase via semidefinite programming in (Le et al., 2021)).

Architecture Type	Typical Use Case	Example Source
Two-headed/MLP actor	RL for force-based object manipulation	(Fang et al., 17 Mar 2025, Portela et al., 2024)
Transformer (ViT)	Policy learning with visual+force features	(Liu et al., 24 Feb 2025, Adeniji et al., 2 Jun 2025)
ResNet-U-Net	Vision-based force field prediction	(Hanai et al., 2023)
RLS/Reservoir	SNNs for dynamical trajectory emulation	(Nicola et al., 2016)
GNN (direct force)	Atomistic force field learning	(Park et al., 2020)

4. Empirical Results, Transferability, and Benchmarks

Direct force learning underpins several state-of-the-art results for contact-rich, precise, and generalizable skills:

Training Efficiency: FLEX achieves convergence an order of magnitude faster (0.4–0.9M steps vs. 10M+ steps) than end-to-end RL for sustained-contact articulated-object manipulation, with success rates 88–91% on prismatic and 64–67% on revolute joints (Fang et al., 17 Mar 2025).
Cross-robot Transfer: Policies learned in force-space are re-targetable to new robot platforms (e.g., Panda→UR5e→Kinova), with a measured performance drop of less than 15% and no retraining required (Fang et al., 17 Mar 2025).
Imitation Learning: Force-based imitation learning in tight-tolerance assembly (pipe insertion, welding) achieves up to 95.8% success against 53–55% for vision- or action-only baselines (You et al., 24 Jan 2025). DexForce, using force-informed action encoding, reaches 76% mean success, outperforming position-only variants (near 0%) in dexterous tasks (Chen et al., 17 Jan 2025).
Safety, Robustness, Sim-to-real: Direct inclusion of force and tactile state in RL policies yields safer, more sample-efficient and sim-to-real robust behavior, e.g., 90% in-simulation vs 75% real-world success for force+touch vs 40% with vision-only (Lin et al., 30 Oct 2025).
Physical Prediction: Graph neural networks for force regression on atomic systems (GNNFF) reproduce ab-initio MD forces and dynamics within 14% of AIMD Li diffusion coefficients, outpacing energy-only baselines (Park et al., 2020).
Vision-Driven Force Mapping: Encoder–decoder models hallucinate 3D force maps from single top-down images to inform manipulation planning, reducing non-target object disturbance by 26–39% vs. uninformed baselines (Hanai et al., 2023).

5. Practical Implementations and Control Schemes

Methodologies for direct force learning integrate across hardware, simulation, and algorithmic components:

Physical Data Acquisition: Realistic force learning relies on accurate force/torque sensing at contact points (multi-axis FT sensors, tactile gloves, joint-torque readout, actuator current), often synchronized with image, pose, or proprioceptive data. Bilateral teleoperation systems with force reflection ("leader-follower" setups) facilitate the collection of complex force-rich demonstrations (Liu et al., 24 Feb 2025).
Algorithmic Workflows: For RL, action spaces may be pure force (applied to object) or combined force/position with explicit safety filtering; e.g., RL agents modulate impedance/admittance control parameters, with reward and constraints grounded in measured forces (Portela et al., 2024, Beltran-Hernandez et al., 2020). Hindsight Experience Replay and domain randomization are systematically used to bridge sim-to-real and ensure safety under contact force variations (Lin et al., 30 Oct 2025, Cui et al., 2023).
Compliance and Adaptivity: Impedance control, dynamic constraint frames, and force-based gating are frequently employed to ensure contact stability, smooth transition between free-space and contact phases, and robust compliance to environment uncertainties (Conkey et al., 2018, Le et al., 2021).
Data Efficiency and Inductive Biases: Mechanisms such as randomizing contact points over episodes (to learn contact-invariant skills) and layered loss structures (e.g., combining GAIL with behavior cloning and trajectory losses) improve force-based transfer and generalization (Fang et al., 17 Mar 2025, You et al., 24 Jan 2025).
Force-Attending Representations: Attention networks and curriculum strategies (progressively blurring vision to enforce force-feature reliance) are used to force multi-modal policies to attend to force information, yielding substantial gains in contact-rich transfer to unseen objects and tasks (Liu et al., 24 Feb 2025).

6. Impact, Limitations, and Future Trajectories

Direct force learning—by representing forces explicitly—has demonstrated substantial gains in sample-efficiency, robustness to hardware/platform/task variation, and the enablement of contact-rich, compliant, and safety-critical behaviors. Notable advantages include:

Robot-agnostic Transfer: Policies that operate in object-centric force space transfer seamlessly across robot arms with different morphology and control stacks (Fang et al., 17 Mar 2025).
Compliance and Delicate Manipulation: Explicit force prediction and control are essential for tasks with stringent failure modes (e.g., not crushing an egg or bag of chips), where binary or position-only policies categorically fail (Adeniji et al., 2 Jun 2025).
Learning with Limited Sensing: Even with minimal force sensing (single-joint FSR), learned policies can track force targets robustly in dexterous non-prehensile manipulation (Cui et al., 2023).

Limitations center on the fidelity of simulation/force modeling (imperfect contact physics, inaccurate force signals), dependency on high-quality demonstrations, and the challenge of generalizing to unmodeled compliance, stick-slip phenomena, or distributed multi-contact. Extending to full 3D directional force control, better incorporating torque/rotational signals, and fusing force learning with vision or other modalities for more scalable and general-purpose policies are open avenues (Adeniji et al., 2 Jun 2025, Hanai et al., 2023).

A plausible implication is that as sensor, simulation, and learning technologies improve, direct force learning will become the standard interface for skill acquisition in contact-rich, safety-critical, and transfer-demanding robot applications. The codification of best practices for force representation, curriculum, and multi-modal learning remains an active frontier.