Neural External Torque Estimation (NEXT)

Updated 12 June 2026

Neural External Torque Estimation (NEXT) is a sensorless, neural network-based method for estimating external joint torques using proprioceptive, EMG, and exteroceptive signals.
NEXT approaches bypass explicit model inversion by training supervised and hybrid neural regressors on time-series data to achieve accurate and low-latency torque predictions.
Real-time deployment of NEXT enables adaptive robotic control, enhanced safety in human-robot interactions, and cost-effective force feedback in challenging sensor environments.

Neural External Torque Estimation (NEXT) is a class of sensorless, data-driven methodologies for estimating external joint torques or wrenches in robotic and human biomechanical systems using only internal or minimally intrusive signals and without dedicated force/torque sensors. NEXT approaches leverage neural network architectures to learn direct or physics-informed mappings from proprioceptive, electromyographic, or exteroceptive signals to joint torques, thereby enabling force feedback, adaptive control, collision detection, and advanced human-robot interaction in settings where conventional force sensing is impractical or cost-prohibitive.

1. Fundamental Problem Formulation and Theoretical Foundation

The estimation of external joint torque is grounded in the joint-space or floating-base rigid-body dynamics:

$\tau_m = M(q) \ddot{q} + C(q, \dot{q}) \dot{q} + g(q) + \tau_{ext}$

where $q$ are joint positions, $\tau_m$ are the measured or commanded motor torques (or their proxies, e.g., currents), and $\tau_{ext}$ are the torques induced by external contacts, objects, or interaction forces. In practice, direct numerical inversion to estimate $\tau_{ext}$ is highly sensitive to modeling errors and unmodeled effects such as joint friction, nonrigid contacts, or sensor bias.

NEXT circumvents explicit model inversion by training a neural network regressor (or hybrid model) to map historic or multimodal input sequences $x$ directly to an external torque estimate, often in the form:

$\hat{\tau}_{ext} = \tau_m - f_\theta(x)$

where $f_\theta$ encodes the "free-space" dynamics via supervised regression, and the residual with respect to the observed system input ( $\tau_m$ ) yields the (sensorless) estimate of external torque (Oh et al., 10 Jun 2026).

The paradigm generalizes to floating-base systems, soft manipulators, and even exoskeletons, with appropriate adaptation to input modalities and output targets (Lim et al., 2024, Shan et al., 2023, Kumar et al., 2024).

2. Neural Architectures and Input Modalities

NEXT methods deploy a variety of architectures, tuned to task and platform:

Temporal Models: LSTMs and GRUs are used to ingest time-series proprioceptive signals (joint positions, velocities, deltas from setpoints), achieving high prediction fidelity with short memory horizons (window sizes $H \sim [20,100]$ time steps). Stateless (sliding-window) operation is preferred for robustness to drift (Oh et al., 10 Jun 2026, Lim et al., 2024, Lim et al., 2023).
Physics-Informed RNNs: Physics-Informed Gated Recurrent Networks (PiGRN) tightly couple GRUs to physics constraints and output not only torque, but kinematic state and external mass, with a loss comprising both data fidelity and inverse dynamics residuals (Kumar et al., 2024).
Multilayer Perceptrons (MLPs): For systems with strong priors and feature-engineered inputs (e.g., PCA-reduced EMG or windowed statistics), shallow MLPs can rival temporal models in torque estimation accuracy, especially in small-sample regimes (Chari et al., 23 Jan 2026).
Temporal Convolutional Networks (TCNs): Dilated/causal TCNs process short windows of IMU, EMG, or encoder data to extract instantaneous torque or classify gait phase, enabling real-time adaptive exoskeleton control (Weigend et al., 1 Aug 2025, Chari et al., 23 Jan 2026).
Hybrid Residual Architectures: NEXT variants incorporate hierarchical or parallel residual MLPs, with explicit long-term memory of static/dynamic transitions ("Motion Discriminator"), critical for accurate friction-hysteresis cancellation (Shan et al., 2023).

Inputs span joint encoder positions, velocities, commands, motor currents, IMUs (kinematic/kinetic state), sEMG (muscle activation), visual data, and hybrid stacks including task context.

3. Data Collection, Training Procedures, and Loss Functions

Robust training of NEXT models requires strategic dataset design and loss construction:

Dataset Diversity: Datasets are collected across free-space, contact-rich, and specialized manipulation scenarios (sliding, assembly, hand-guiding), covering the full operational envelope and, when relevant, divided by inverse-kinematic classes for transferability (Shan et al., 2023).
Labeling: Ground-truth torque is derived from either direct F/T sensors (used only during training), inverse dynamics (from kinematics and known masses), or momentum observer baselines corrected for model uncertainty (Lim et al., 2024, Kumar et al., 2024).
Physics-Informed Losses: In addition to standard supervised losses (MSE, weighted Huber), physics-informed networks impose model-based constraints—penalizing violations of the rigid-body inverse dynamics with physics residual loss terms, which are weighted alongside data loss (Kumar et al., 2024).
Augmentation and Regularization: Temporal mixing (sliding windows, segmentation), domain-conditional normalization, random torque exploration, and Dropout or weight decay are used to enhance robustness and generalization (Chari et al., 23 Jan 2026, Lim et al., 2024).
Hyperparameters: Learning rates are typically $q$ 0– $q$ 1 for Adam/AdamW optimizers; batch sizes from 32 to 1024 are reported, with early stopping on validation loss.

Quantitative performance is reported via RMSE, MAE, $q$ 2, and relative error metrics on held-out or cross-environment splits. For example, LSTM-based NEXT on a Franka manipulator achieves per-joint $q$ 3 error of $q$ 4 Nm in contact vs. $q$ 5 Nm for disturbance observer and $q$ 6 Nm for classical inverse dynamics (Oh et al., 10 Jun 2026).

4. Real-Time Performance and Implementation Constraints

NEXT frameworks are designed for compute-efficient inference, enabling closed-loop control:

Latency and Throughput: MLPs with $q$ 7 units/layer yield $q$ 8 ms/frame inference at 100 Hz on a CPU (Shan et al., 2023). GRU/LSTM models deliver $q$ 9 ms latency at similar rates. Multi-threaded HRDL yields $\tau_m$ 0 ms evaluation at 570 Hz (Shan et al., 2023). PiGRN and TCN exoskeleton models are deployed at 100 Hz–1 kHz, with total pipeline including preprocessing and filtering remaining within tens of milliseconds (Weigend et al., 1 Aug 2025, Kumar et al., 2024).
Online Integration: Real-time estimates feed directly into admittance controllers, policy learning modules (e.g., force-informed re-sampling training), or teleoperation feedback loops, supplanting physical force sensors (Oh et al., 10 Jun 2026, Weigend et al., 1 Aug 2025, Shan et al., 2023).
Sensor Requirements: Most variants require only encoders and, in some cases, base IMUs or sEMG; exteroceptive approaches (e.g., VFTS) extend sensing via camera modalities (Collins et al., 2022).

5. Empirical Performance and Applications

Applications validate NEXT capabilities across a spectrum of tasks:

Application Domain	Typical Inputs	Reported Accuracy (RMSE)	Reference
Robotic manipulation	Encoders, currents	1–3 N, 0.1–0.2 Nm	(Shan et al., 2023)
Teleoperation, BC	Proprioceptive seq., LSTM	0.54 Nm (Franka, contact)	(Oh et al., 10 Jun 2026)
Humanoid collision detection	Encoders, IMU	1–2 Nm (legs, sim), 2.6 Nm (real LL)	(Lim et al., 2024)
Exoskeleton torque (EMG)	sEMG, joint kinematics	7.2 % (elbow NM), 11.4 % (shoulder NM), $\tau_m$ 1	(Kumar et al., 2024)
Stroke gait exoskeleton	3x IMU, encoder, TCN	0.16 Nm/kg, $\tau_m$ 2	(Weigend et al., 1 Aug 2025)
Soft gripper F/T sensing	RGB image (fisheye), ResNet-18	0.8–1.7 N, 0.05–0.18 Nm	(Collins et al., 2022)

Downstream benefits include: sensorless compliant interaction, adaptive exoskeleton assistance, robust collision detection, fine manipulation (100-micron clearance assemblies), force-informed policy learning, and high-frequency feedback in complex unstructured environments.

6. Robustness, Generalization, and Limitations

NEXT architectures demonstrate several robustness features:

Physics hybridization (MOB+GRU, PiGRN) attenuates drift and out-of-distribution errors relative to pure data-driven or pure model-based approaches, enabling sensitive, robust collision response (Lim et al., 2024, Kumar et al., 2024).
Training with domain-specific augmentation (e.g., random torque exploration, motion discriminator for hysteresis) enhances extrapolation to unseen contacts, friction regimes, and hardware variations (Shan et al., 2023, Lim et al., 2024).
Transfer learning (pretraining on healthy populations, fine-tuning on post-stroke or variant hardware) improves generalization in highly variable or poorly sampled regimes (Weigend et al., 1 Aug 2025).

Nonetheless, current limitations include dependence on data coverage (rare contacts, high-speed phenomena), ground-truth torque acquisition (reliance on external F/T during training or inverse-dynamics biases), and challenges with multi-contact localization or payload change adaptation. Explicit uncertainty quantification remains underdeveloped in most architectural instantiations.

7. Outlook and Future Directions

Key frontiers for NEXT research and deployment include:

Fully unsupervised or self-supervised online learning to obviate the need for labeled ground-truth torque (Kumar et al., 2024, Collins et al., 2022).
Integration of richer physics priors: inclusion of actuation limits, muscle-tendon complex, multi-modal state inference, or compliant contact dynamics.
Modularity and scalability: extension to whole-body floating-base and multi-limb systems via modularized network pools (Lim et al., 2024).
Robust domain adaptation and online meta-learning for sustained accuracy amidst subject variation, hardware drift, or context shift.
Deployment to embedded/wearable platforms: streamlined TCNs and lightweight MLPs/GRUs now enable on-device inference at sub-millisecond latency in embedded microprocessors (Weigend et al., 1 Aug 2025).

A plausible implication is that NEXT architectures will underpin the next generation of sensorless, force-aware robot control and rehabilitation paradigms, filling the gap between model-based observers and cumbersome hardware-based sensing. The field continues to evolve toward robust, real-world deployment with minimal hardware overhead and maximal task generality.