Papers
Topics
Authors
Recent
Search
2000 character limit reached

FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning

Published 10 Jun 2026 in cs.RO, cs.AI, cs.LG, and eess.SY | (2606.12406v1)

Abstract: Contact-rich manipulation requires force sensitivity, but many robot arms lack dedicated force sensors due to their high cost. We present Neural External Torque Estimation (NEXT), a data-driven method that estimates external joint torques without needing any dedicated force sensors. NEXT trains in 1 minute from only 10 minutes of free-motion data, yet achieves estimates comparable to dedicated joint-torque sensors. NEXT enables force-feedback teleoperation on low-cost arms and improves policy learning through Force-Informed Re-Sampling Training (FIRST), which up-samples pre-contact and contact segments during behavior cloning. Across five long-horizon tasks, FIRST outperforms prior force-aware policies by over 17% in task progress. Together, NEXT and FIRST bring force-aware teleoperation and policy learning to off-the-shelf robots without additional sensing hardware. Video results and code are available at https://jasonjzliu.com/factr2

Summary

  • The paper introduces NEXT, a self-supervised LSTM that accurately infers external joint torque from standard proprioceptive signals.
  • The paper proposes FIRST, a re-sampling strategy that up-samples pre-contact and contact phases to enhance force-aware policy learning.
  • The paper demonstrates that integrating NEXT and FIRST improves teleoperation and task performance on low-cost robot arms.

Data-Driven External Force Sensing for Commodity Robot Arms: FACTR 2

Introduction and Motivation

Contact-rich robotic manipulation tasks, such as assembly and insertion, fundamentally require precise force perception and control for success. However, most commodity robot arms lack force-torque (FT) sensors due to cost and integration challenges. Existing alternatives—such as analytical inverse dynamics, low-cost tactile sensors, or system identification—have proven insufficiently robust, especially on low-cost hardware with significant model inaccuracies and transmission nonlinearities. "FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning" (2606.12406) addresses these gaps through a unified paradigm: (1) Neural External Torque Estimation (NEXT), a self-supervised method to reliably infer external joint torque from standard proprioception without dedicated force sensing, and (2) Force-Informed Re-Sampling Training (FIRST), which leverages such estimated signals to improve robot policy training on contact-rich manipulation tasks. Figure 1

Figure 1: (a) NEXT produces high-quality joint torque estimates from brief, contact-free data, requiring neither force sensors nor system identification. (b) FIRST segments demonstration data into free-space, pre-contact, and contact, upsampling contact-relevant phases for improved policy learning.

Neural Estimation of External Torque (NEXT)

NEXT replaces the need for dedicated FT sensors by learning a network to predict free-space torque directly from temporal sequences of proprioceptive signals, including joint positions, velocities, and tracking errors. The model is a history-based LSTM trained with less than ten minutes of free-space trajectories, requiring no contact data and no explicit parameter identification. At deployment, NEXT outputs an external torque estimate as the residual between the measured motor torque (from current sensing) and the predicted free-space torque.

This approach outperforms both analytical residual-based estimators (such as FILIC) and disturbance observers, particularly on arms with substantial unmodeled joint friction, stiction, and actuator nonlinearities. NEXT benefits from direct modeling of hardware-specific effects, yielding both denoised and accurately scaled external torque estimates.

The strong empirical validity of NEXT is demonstrated quantitatively in challenging regimes: Figure 2

Figure 3: NEXT yields the lowest joint torque L1L_1 errors in both free-space and contact settings on the Franka platform; the method's estimates remain nearly noiseless in free space and faithfully track sensor readings during contact.

NEXT generalizes beyond high-end arms (Franka) to commodity systems (AgileX Piper, YAM), where joint-torque sensing is otherwise absent. On the Piper, NEXT achieves a mean joint L1L_1 error of only $0.018$ Nm, outperforming both FILIC and disturbance observers. Figure 4

Figure 2: NEXT's error in free-space torque estimation on the AgileX Piper is substantially lower than all baselines. In user studies, participants rated teleoperation based on NEXT's feedback as more intuitive than with alternative methods.

Importantly, force estimation using NEXT supports robust force-feedback teleoperation on low-cost arms, an ability previously restricted to expensive platforms. User studies confirm higher usability and reduced exertion for operators when teleoperating with NEXT-based force feedback, with performance comparable to ground-truth torque-based feedback on the Franka platform.

Force-Informed Re-Sampling Training (FIRST)

Policy failure in contact-rich manipulation is disproportionately associated with precise alignment just before contact and during the contact phase itself. Standard behavioral cloning, which typically samples trajectories uniformly, fails to model these sparse, failure-prone events adequately.

FIRST introduces a data distribution shift at training time: using phase information computed via NEXT, it segments demonstration trajectories into free-space, pre-contact, and contact. Rather than uniformly sampling trajectory points, FIRST up-samples pre-contact and contact segments for policy training. This targeted sampling substantially lowers policy validation loss on critical pre-contact and contact datapoints, leading to improved generalization.

Empirically, FIRST achieves robust improvements in task completion rates and partial progress across five contact-rich, long-horizon manipulation benchmarks—including assembly, insertion, and dexterous screwing. FIRST outperforms baselines (such as FACTR, TA-VLA, and direct torque-conditioning) by at least 17% task progress across all tested tasks. Figure 3

Figure 5: The five evaluation tasks are all long-horizon and contact-rich, demanding precise force sensitivity in multiple stages.

Figure 5

Figure 6: FIRST yields the highest task progress on all five manipulation tasks when using a flow-matching policy.

Ablations on up-sampling strategies indicate that emphasizing pre-contact segments alone provides the largest average benefit across tasks, with task-dependent gains from including both pre-contact and contact phases for specific manipulations requiring sustained interaction.

Comparison with Baselines and Policy Inputs

A critical empirical observation is that policies conditioned directly on raw joint current perform worse than those conditioned on external torques processed by NEXT. Joint current contains a superposition of actuator effort for both free-space motion and contact, which confounds policy representation learning. In contrast, the external torque signal provided by NEXT is a compact, contact-specific representation that enhances policy attention around critical interaction events. Figure 7

Figure 4: Attention analysis during screw-cap manipulation reveals that policies using learned external torque focus their attention on key contact events, while those conditioned on joint effort do not.

Practical and Theoretical Implications

Practically, FACTR 2 extends the benefits of force-aware learning and closed-loop teleoperation to a wide range of affordable hardware, obviating the need for added FT hardware, expensive system identification, or complex sensor calibration. The system runs in real time, requiring minimal computational resources, and can be widely adopted with a simple ten-minute calibration procedure. The method democratizes high-quality force sensing and downstream contact-aware policy learning for platforms that have historically lacked these capabilities.

Theoretically, the data-centric, history-based approach to both sensorless force estimation and phase-sensitive imitation learning challenges the dominance of explicit dynamics modeling, value-based data selection, and raw proprioceptive augmentation in robotic learning pipelines. The empirical results suggest that phase-aware data biasing, driven by reliable event segmentation, dominates simple architectural or input-modality augmentations.

Limitations and Future Directions

NEXT's absolute force scaling depends on accurate knowledge of the actuator torque constant and may require calibration for tasks needing precise force control. Cross-platform generalization requires retraining due to hardware-specific dynamics. In future work, integrating meta-learning for cross-device generalization, combining discovery of contact-rich phases with semantic task understanding, and merging force-phase data selection with online reinforcement could further enhance contact-rich manipulation capabilities across diverse domains.

Conclusion

FACTR 2 demonstrates that with minimal sensor and compute requirements, commodity robot arms can gain robust, high-fidelity external torque estimation and leverage this for both teleoperation and advanced force-aware policy learning. The combination of NEXT and FIRST achieves policy performance rivaling sensorized systems, highlighting new opportunities for data-driven contact-rich manipulation at scale.

Whiteboard

Explain it Like I'm 14

A simple explanation of “FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning”

Overview

This paper is about helping robots “feel” forces when they touch things—without using expensive touch sensors. The authors introduce a system called FACTR 2 with two main parts:

  • NEXT: a way to teach a robot arm to estimate the forces it feels from the outside, using only the robot’s own motor signals.
  • FIRST: a way to train robot skills that focuses extra practice on the moments right before and during contact, which are usually the hardest.

Together, these make low-cost robot arms much better at careful, touch-heavy tasks like plugging things in, screwing on caps, or handling soft objects.

What questions does the paper ask?

The paper tries to answer:

  • Can a robot learn to sense external forces using only data from its own motors, instead of pricey force sensors?
  • If a robot can estimate those forces, can that information help people teleoperate (remotely control) the robot with force feedback?
  • Can this force information improve how robots learn from demonstrations, especially in the tricky parts where contact happens?

How does it work? (Methods in everyday language)

The system has two parts. Here’s the idea in simple terms:

  1. NEXT (Neural External Torque Estimation)
  • Think of a robot arm as having “muscles” (motors). The robot always knows how hard its motors are working (from motor current), like how you can tell how much effort your arm is making.
  • The robot first learns, from only 10 minutes of “no-touching” movement, how much effort it should need to move freely when it isn’t touching anything.
  • Later, during real tasks, the robot compares:
    • Actual effort now (from motor current)
    • Minus the predicted effort it would need if it weren’t touching anything
  • The difference equals “extra effort from the outside,” which means the force from touching the environment. This is like knowing how heavy your grocery bag is by subtracting how your arm normally feels from how it feels when lifting the bag.

Key point: NEXT learns this free-motion prediction with a small neural network (an LSTM), trains in about 1 minute, and doesn’t need any special force sensors.

  1. FIRST (Force-Informed Re-Sampling Training)
  • Robots often fail during brief, tricky moments: just before contact (lining things up) and during contact (pushing, sliding, inserting).
  • FIRST uses the force signal from NEXT to label each moment in a demo as:
    • Free-space (not touching)
    • Pre-contact (about to touch)
    • Contact (touching)
  • During training by imitation (learning by copying an expert), FIRST shows the robot more examples from pre-contact and contact moments. It’s like giving a student extra practice on the hardest parts of a math problem.

What did they find, and why is it important?

Here are the main results the authors report:

  • Accurate “touch” without touch sensors:
    • NEXT’s force estimates are close to what expensive, built-in force sensors report on a high-end robot (Franka).
    • In free space, NEXT stays near zero (as it should), and during contact it tracks the true force signal closely. It beat other methods that try to model physics by hand.
  • Better teleoperation (remote control with “feel”):
    • In a user study, people found controlling a robot with NEXT-based force feedback easier and more efficient than other low-cost methods. It felt close to using real force sensors—even on cheaper arms that don’t have them.
  • Stronger skill learning on real tasks:
    • Using FIRST, policies improved task progress by over 17% across five long, multi-step, contact-heavy tasks (like LEGO assembly or screwing on caps).
    • Upsampling the “pre-contact” moments helped the most—these are the moments where small alignment errors make or break success.
  • Fast and practical:
    • NEXT needs only around 10 minutes of contact-free data and about 1 minute of training to get useful force estimates.
    • It works on both a high-end arm (about $30,000) and low-cost arms (around$2,500).

Why this matters: Touch is crucial for reliable manipulation. Getting touch-like feedback without buying expensive sensors makes advanced robot skills more affordable and widely available.

What’s the impact?

  • More capable low-cost robots: Shops, homes, and schools could use cheaper arms for delicate tasks—like plugging in cables or handling fragile items—because they can “feel” contact well enough to be safe and precise.
  • Better training data use: By focusing on the hardest moments (pre-contact and contact), robots learn smarter and faster from the same demonstrations.
  • Easier, safer teleoperation: People can guide robots more naturally with force feedback, even when the robot has no force sensors.

The authors note two limitations:

  • The exact size of the force depends on a motor constant from the manufacturer. If that number is off, you may need to recalibrate with a real sensor once.
  • NEXT is robot-specific; switching to a new arm usually needs retraining (though it’s quick).

Overall, FACTR 2 shows that with a little clever learning, robots can “feel” well enough to work better—no pricey hardware required.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise list of unresolved issues and concrete opportunities for further research:

  • Absolute scaling without external calibration: NEXT relies on the motor torque constant K from the manufacturer; no sensor-free calibration or self-calibration procedure (e.g., via gravity-based tests or known payload moves) is provided, nor is drift of K over temperature/aging quantified.
  • Robustness to payload/tool changes: The effect of swapping end-effectors, changing payload mass/CoM, or adding tools on NEXT accuracy and FIRST segmentation is not characterized; no rapid adaptation or online update strategy is proposed.
  • Domain shift over time and conditions: Sensitivity to temperature, gearbox wear, lubrication changes, and cable drag over days/weeks remains untested; schedules for re-collection (10 min) and re-training cadence are unspecified.
  • Current sensing requirements: Many low-cost drivers report noisy/saturated currents; the paper does not quantify NEXT’s dependence on current-sensor bandwidth/accuracy or on motor-side vs joint-side sensing, nor provide mitigation (e.g., filtering/denoising schemes or robust training).
  • Actuator/transmission nonlinearities: While NEXT implicitly models friction, backlash, dead zones, and torque ripple, there is no analysis of failure modes under extreme nonlinearity (e.g., high gear ratios, series elasticity, or compliance), or of whether longer history/context reduces errors in those regimes.
  • Saturation and controller mode changes: Behavior under current/voltage saturation, position error limits, anti-windup, or mode switching (e.g., from position to torque) is not evaluated; residuals could be misattributed to contact.
  • Ground-truth validity in contact: Contact accuracy is benchmarked against the Franka’s internal “external torque estimate,” not against an independent 6-axis end-effector F/T sensor; absolute accuracy and bias of NEXT in contact remain uncertain.
  • End-effector wrench estimation: The paper estimates joint-space external torques only; accuracy after Jacobian mapping to Cartesian forces/moments (and sensitivity to kinematic/dynamic model errors) is not evaluated.
  • Contact localization and type: Joint-space residuals do not identify contact location or distinguish single vs multiple contacts; methods for localizing contact along the kinematic chain or classifying contact types are not explored.
  • High-speed/impulsive contacts: Performance under impacts, fast transitions, or dynamic tasks (beyond quasi-static manipulation) is untested; latency and bandwidth limits for reliable contact tracking remain unknown.
  • Safety, passivity, and stability in teleoperation: No formal passivity/stability analysis with learned force estimates is provided; worst-case estimation errors and their safety implications for bilateral haptics are not quantified.
  • Generality across robots: NEXT is trained per robot; the feasibility of cross-robot pretraining, parameter sharing, or few-shot transfer (including to different actuator technologies) is not investigated.
  • Compute and deployment constraints: Profiling is shown on a desktop CPU/GPU; feasibility on embedded controllers (e.g., ARM, microcontrollers), real-time determinism, and resource–accuracy trade-offs are not assessed.
  • Data contamination and coverage: Procedures to detect and exclude accidental contacts from the “free-space” training set are not described; the effect of distributional coverage (velocities, accelerations, joint ranges) on generalization is not quantified.
  • Sensitivity to hyperparameters in FIRST: The hysteresis thresholds (T_low, T_high) and the fixed 1-second pre-contact window are heuristic; their task- and robot-dependence and automatic tuning strategies are not studied.
  • Segmentation signal design: FIRST uses the L1 norm of joint torques; per-joint, directional, or dynamics-aware scores (e.g., energy, power, torque rate) and their impact on segmentation quality are not explored.
  • Robustness of FIRST to estimation errors: How segmentation mistakes from NEXT (false positives/negatives) affect training and policy robustness is not analyzed; no confidence-aware or uncertainty-weighted sampling is attempted.
  • Reweighting mechanisms: Only sampling reweighting is tested; alternative or complementary approaches (loss weighting, curriculum schedules, per-phase augmentation, stage-conditioned policies, or stage-specific heads) are not compared.
  • Integration with control at inference: Policies output joint positions; combining FIRST with impedance/hybrid force–position control or low-level force objectives at inference is not evaluated.
  • Combination with value- or success-aware selection: Interactions between FIRST and value-based filtering (e.g., advantage-weighted BC, success classifiers, influence functions) are not examined.
  • Data efficiency and scaling laws: Benefits of FIRST under scarce vs abundant demonstrations and with varying task difficulty are not systematically studied.
  • Generalization to unseen objects and environments: The extent to which FIRST improves cross-object, cross-geometry, or cross-surface generalization is not reported.
  • Deformable vs rigid contact regimes: Tasks include deformable interactions, but the differential impact of FIRST across deformable/rigid cases and across varying friction/compliance is not dissected.
  • Bimanual and self-contact scenarios: The approach is not evaluated for self-contact or inter-arm contacts in bimanual settings, nor for disambiguating which arm/segment is in contact.
  • User study scope: Teleoperation is evaluated on a single wiping task with subjective ease-of-use and joint-torque exertion; broader tasks, objective performance (time/success/accuracy), learning curves, and statistical power analyses are missing.
  • Failure analysis: Detailed error decomposition (e.g., by joint, speed, direction, or contact condition) and qualitative failure modes for NEXT/FIRST are not provided to guide targeted improvements.
  • Online adaptation: NEXT is trained offline; mechanisms for online/continual learning, drift detection, or confidence-aware updates during deployment are not explored.
  • Access constraints: Some commodity arms do not expose motor currents or torque constants; pathways for applying NEXT under limited APIs (e.g., using only q, qdot) are not addressed.
  • Calibration without external sensors: Methods to calibrate K or end-effector scaling factors using only robot-internal signals (gravity, static holds, known configurations) are not proposed.
  • Uncertainty estimation: NEXT produces point estimates; uncertainty quantification (e.g., ensembles, Bayesian RNNs) and its use in segmentation or teleop safety are not investigated.

Practical Applications

Immediate Applications

Below are near-term, deployable use cases that leverage NEXT (Neural External Torque Estimation) and FIRST (Force-Informed Re-Sampling Training) as described in the paper. Each item notes sectors, potential tools/workflows, and key assumptions or dependencies.

  • Sensorless force feedback for teleoperation on low‑cost arms (manufacturing, logistics, field service, education, hobbyist)
    • What: Add haptic force feedback to commodity robot arms (e.g., AgileX Piper, YAM) without installing force–torque sensors by running NEXT from motor currents, enabling skilled operators to “feel” contact.
    • Tools/products/workflows:
    • Software-only NEXT module packaged for ROS/ROS2; runs at 100 Hz+.
    • FACTR-style teleop workflow: leader–follower setup with force feedback torque law using NEXT’s external torque as input.
    • Quick setup: ~10 minutes of free-motion data collection; ~1 minute training on a GPU.
    • Assumptions/dependencies:
    • Access to joint motor current and torque constants K; accurate K improves absolute scaling.
    • Stable robot APIs to read currents and command torques/positions.
    • Sufficient compute for low-latency inference (measured ~1.76 ms per pass in the paper).
    • Safe torque limits and collision handling in the controller.
  • Contact-aware imitation learning via FIRST (assembly, electronics, warehousing, appliance manufacturing, education/research labs)
    • What: Improve behavior cloning performance on contact-rich, long-horizon tasks by up-sampling pre-contact and contact segments, using NEXT to segment demos without extra sensors. Reported >17% gains in task progress.
    • Tools/products/workflows:
    • Training plugin that labels datasets into free-space, pre-contact, and contact via NEXT.
    • Integration with Flow Matching and ACT policy training; configurable up-sampling ratios.
    • Off-the-shelf dataset curation tool that exports contact-phase tags for existing demo logs.
    • Assumptions/dependencies:
    • Adequate demonstration coverage of pre-contact and contact; camera inputs if the policy uses vision.
    • Reasonable threshold selection (hysteresis) for contact onset in NEXT outputs.
    • Normalized force features during training; consistent control frequency for 1s pre-contact windows.
  • Real-time contact detection and safety reactions on non-sensorized arms (collaborative robots in SMEs, labs, warehousing)
    • What: Use NEXT’s external torque residual as a contact detector to trigger stop-on-contact, mode switching to impedance control, or reducing speed upon unexpected interactions.
    • Tools/products/workflows:
    • Safety interlock node monitoring NEXT’s |τ_ext| with hysteresis and timed debounce.
    • Integration with compliant/impedance control modes already available in many controllers.
    • Assumptions/dependencies:
    • Conservative thresholding to avoid false positives/negatives.
    • Verification of latency bounds for safety; fallback to hardware e-stop.
  • Rapid robot onboarding and calibration-lite force profiling (systems integration, contract manufacturing, robotics startups)
    • What: Quickly build a usable force estimate after deployment or maintenance (reduction gear replacements, lubrication changes) to restore force-aware behavior without full system identification.
    • Tools/products/workflows:
    • Guided “10-minute free-motion” capture wizard covering joint ranges and speeds.
    • Automated training and validation with live plots of residuals and noise floors.
    • Assumptions/dependencies:
    • Access to robot URDF not required; relies on proprioception and command history.
    • Torque constant K availability (from datasheet) or rough calibration.
  • Dataset labeling and analytics without FT sensors (academia, R&D, QA)
    • What: Post-hoc labeling of large demonstration datasets into free-motion, pre-contact, and contact intervals; improved error analysis focused on failure-prone phases.
    • Tools/products/workflows:
    • Log parser that ingests robot state + commands; emits phase labels per timestep.
    • Phase-wise validation loss dashboards to guide data collection and curriculum.
    • Assumptions/dependencies:
    • Synchronized logs of joint state, commanded targets, and motor currents.
    • Consistent control-loop timestamps to compute histories.
  • Anomaly and collision monitoring for predictive maintenance (manufacturing, QA)
    • What: Track long-term trends in external torque signatures during nominal tasks to detect drift (e.g., increased friction, misalignment, wear) without added sensors.
    • Tools/products/workflows:
    • Baseline “force fingerprint” per task; online divergence metrics with alerts.
    • Periodic recalibration using short free-space data capture after maintenance.
    • Assumptions/dependencies:
    • Stable task programs so residuals are comparable over time.
    • Proper filtering to prevent nuisance alarms due to task variants.
  • Hands-on education in force-aware robotics with low-cost hardware (education, makerspaces)
    • What: Enable courses and labs on impedance control, contact dynamics, and haptics using commodity arms and NEXT instead of expensive FT sensors.
    • Tools/products/workflows:
    • Teaching kits: prebuilt NEXT models, demo scripts (wiping, insertion, screwing).
    • Visualizations of external torque vs. time for learning contact phenomena.
    • Assumptions/dependencies:
    • Basic torque/position control capability and access to motor currents.
  • Cost-down retrofit pathway for existing automation cells (SMEs, integrators)
    • What: Replace or defer addition of dedicated FT sensors on end-effectors by deploying NEXT to recover most of the utility for many tasks (e.g., insertions, wiping, fastening).
    • Tools/products/workflows:
    • Software-only retrofit package; validation checklist comparing NEXT vs. operator feel.
    • Tuning guide for scaling feedback for operator comfort in teleop.
    • Assumptions/dependencies:
    • Some tasks may still require 6-axis wrench at the tool; NEXT provides joint-space torque, not a full end-effector wrench without additional modeling.

Long-Term Applications

These opportunities require further research, scaling, or integration with standards and ecosystem support.

  • Vendor-level integration of sensorless force estimation as a standard feature (robotics OEMs; safety standards bodies)
    • What: Build NEXT-like estimators into robot firmware/servo drives to provide calibrated external torque to users by default.
    • Tools/products/workflows:
    • On-drive or edge-AI deployment (e.g., ONNX, tiny LSTM) with factory pretraining + per-unit adaptation.
    • Self-calibration routines run during commissioning and periodic maintenance.
    • Assumptions/dependencies:
    • Robust handling of temperature drift, gear wear, and actuator replacements.
    • Coordination with ISO/TS 15066 and ISO 10218 safety standards for validation and certification.
  • General-purpose, contact-rich autonomy on low-cost arms (manufacturing, logistics, home robotics)
    • What: Combine NEXT + FIRST with vision-language policies to deliver robust long-horizon manipulation (e.g., cable routing, furniture assembly, packaging) at lower CapEx by removing FT sensors.
    • Tools/products/workflows:
    • Multi-modal policies conditioned on images + τ_ext; task libraries emphasizing pre-contact alignment.
    • Automated data pipelines that emphasize pre-contact segments during training.
    • Assumptions/dependencies:
    • Broader benchmarks and generalization studies across objects, fixtures, and lighting.
    • Additional sensing (tactile, vision depth) for tasks requiring precise 6D force/torque.
  • Adaptive impedance and hybrid force–position control driven by learned residuals (advanced robotics, healthcare, field)
    • What: Use τ_ext to automatically tune stiffness/damping in real time, improving robustness during variable contact and compliant manipulation (e.g., polishing, deburring, surgical training platforms).
    • Tools/products/workflows:
    • τ_ext-conditioned impedance controllers; model-predictive schemes using residual histories.
    • Safety supervisors that blend between modes based on contact confidence.
    • Assumptions/dependencies:
    • Proven stability with history-dependent actuator effects; formal robustness guarantees.
    • Accurate absolute scaling of τ_ext or self-normalizing control laws.
  • Data standards and policy guidance for contact-phase annotation and safety (policy, consortia, academia–industry alliances)
    • What: Establish open formats for contact-phase labels and recommended practices for sensorless force estimation in training and safety cases.
    • Tools/products/workflows:
    • Dataset schemas including pre-contact/contact tags; benchmark leaderboards for sensorless contact estimation.
    • Guidance documents for using τ_ext thresholds in collaborative cell risk assessments.
    • Assumptions/dependencies:
    • Community agreement on metrics (e.g., false contact rates, latency, error bounds).
    • Inclusion of procedures for torque constant K verification and periodic validation.
  • Fusion of sensorless τ_ext with sparse tactile/vision for 6‑DoF wrench inference (advanced manufacturing, research)
    • What: Combine NEXT’s joint-space residuals with minimal end-effector tactiles or vision-based contact geometry to approximate full end-effector wrench for delicate tasks (e.g., glass handling, precision assembly).
    • Tools/products/workflows:
    • State estimators that fuse joint residuals, contact pose, and compliance models.
    • Self-supervised calibration using occasional instrumented fixtures.
    • Assumptions/dependencies:
    • Reliable contact pose estimation; modeling of kinematic/dynamic coupling.
    • Additional calibration steps beyond the 10-minute free-space routine.
  • Large-scale haptic telework platforms (telemaintenance, remote inspection, home care)
    • What: Deploy fleets of low-cost arms with sensorless haptics for remote operators; reduce costs and expand teleoperation to domains where FT sensors are impractical.
    • Tools/products/workflows:
    • Networked teleop systems with latency compensation and τ_ext-based rendering.
    • Operator training curricula emphasizing pre-contact alignment skills.
    • Assumptions/dependencies:
    • Robustness to network delays; standardized leader devices for force rendering.
    • Task-specific safety and privacy regulations in healthcare/home environments.
  • Cross-robot model transfer and self-adaptation (robot fleets, contract manufacturing)
    • What: Pretrain NEXT on families of arms and transfer with few-shot free-motion data; maintain performance across hardware variations and aging.
    • Tools/products/workflows:
    • Meta-learning or domain adaptation pipelines; automated model selection per unit.
    • Continuous learning agents that update τ_f models during scheduled non-contact motions.
    • Assumptions/dependencies:
    • Mitigation of catastrophic drift; versioning and rollback for safety.
    • Telemetry to monitor estimator health in production.
  • Energy and wear optimization via contact-aware planning (manufacturing operations)
    • What: Use τ_ext signals to minimize unnecessary contact forces, reduce joint loads, and tune cycle parameters, lowering energy use and extending component life.
    • Tools/products/workflows:
    • Offline analysis of τ_ext profiles to adjust speeds/impedances.
    • In-cycle adaptive setpoints to avoid high-force events during alignment.
    • Assumptions/dependencies:
    • Accurate τ_ext statistics over many cycles; integration with MES/SCADA systems.
    • Organizational processes to act on analytics (maintenance scheduling, recipe changes).

Notes on Feasibility and Dependencies

  • Access to motor current and torque constants is critical. If K is inaccurate, absolute force scaling may drift; relative contact signals remain useful for control and training but may need periodic calibration.
  • NEXT is robot-specific; a short per-robot (or per-unit) free-motion dataset is required. Cross-robot transfer will reduce but not remove this requirement in the near term.
  • Controller integration matters. For teleoperation and safety reactions, the system must support torque limits, impedance modes, and low-latency pipelines.
  • Thresholds for contact detection should be tuned with hysteresis; task-dependent thresholds may be necessary to balance sensitivity and false alarms.
  • For tasks requiring precise end-effector wrench, a hybrid approach (sensorless τ_ext plus minimal tactile/vision) may be needed until full wrench estimation is matured.

Glossary

  • Action chunk: A contiguous sequence of low-level actions predicted or executed together as a block. "where at:t+k\mathbf{a}_{t:t+k} denotes a corresponding action chunk."
  • Autoencoder-based anomaly detection: An unsupervised learning approach that flags deviations from normal behavior by reconstructing inputs, often used for contact detection. "Unsupervised methods such as autoencoder-based anomaly detection \cite{unsupervised_anomalydetection} can identify deviations from manipulator free motion, but are primarily suited for contact detection rather than continuous force estimation."
  • Backlash: Mechanical play or clearance in gears/transmissions that leads to lost motion when reversing direction. "including nonlinear friction, stiction, backlash, hysteresis, temperature-dependent drive behavior, sensing noise, torque ripple, deadzones, and saturation"
  • Behavior cloning: Supervised imitation learning that trains a policy to reproduce expert actions from observations. "We train the policy by behavior cloning on expert demonstrations"
  • Bilateral teleoperation: A two-way teleoperation scheme where forces sensed at the remote (follower) robot are fed back to the human operator (leader), enabling haptic feedback. "We use the estimated force signal to enable bilateral teleoperation on non-sensorized low-cost arms"
  • Capacitive sensors: Sensors that measure changes in capacitance to infer force or touch, offering lower-cost alternatives for force/pressure sensing. "Recent capacitive sensors such as CoinFT \cite{choi2025coinft} and magnetic tactile sensors such as ReSkin \cite{bhirangi2021reskin} offer lower-cost alternatives"
  • Contact-rich manipulation: Robotic tasks that involve frequent or sustained physical contact with the environment, requiring precise force control. "Contact-rich manipulation requires force sensitivity"
  • Coriolis and centrifugal terms: Components of robot dynamics that account for velocity-dependent inertial effects due to moving links. "Computing τf\tau_f therefore requires an accurate model of the robot mass matrix M\mathbf{M}, Coriolis and centrifugal terms C\mathbf{C}, gravity term g\mathbf{g}, and any other contributing terms $\tau_{\mathrm{other}$."
  • Disturbance observer: An estimator that infers unknown external disturbances/torques from discrepancies in dynamic models, often yielding smoother estimates than inverse dynamics. "a disturbance observer (DO)~\cite{mamedov2020practical}"
  • End effector: The tool or device at the tip of a robot arm that interacts with the environment. "Conventional piezoresistive FT sensors mounted between the arm and end effector"
  • End-effector force sensors: Sensors placed at the robot’s wrist or tool to directly measure forces/torques at the contact interface. "Retrofitting platforms with end-effector force sensors remains challenging and expensive"
  • External joint torque: The joint-space torques caused by interactions with the environment, beyond what is needed for free-space motion. "External joint torque $\tau_{\mathrm{ext}$ is the joint-space torque induced by physical interactions with the environment, excluding the torque required to move the robot in free space."
  • Force-Informed Re-Sampling Training (FIRST): A training strategy that uses estimated external torque to identify contact-relevant phases and up-sample them during imitation learning. "Force-Informed Re-Sampling Training (FIRST) uses learned external torque estimates to segment demonstrations into free-space, pre-contact, and contact phases, then up-samples contact-relevant segments during training to improve policy performance."
  • Force-torque (FT) sensors: Dedicated sensors that directly measure forces and torques, typically mounted at the wrist or joints. "Dedicated force-torque (FT) sensors are typically available only on expensive platforms such as Franka, Flexiv, or KUKA arms."
  • Flow-matching policy: A generative policy trained via flow-matching to map noise to expert actions using a conditional velocity field. "We instantiate πθ\pi_{\theta} as a flow-matching policy parameterized by a conditional velocity field vθv_{\theta}."
  • Free-space inverse dynamics: Learning or modeling the torques required to produce observed motion when no external contacts are present. "we learn a free-space inverse dynamics neural network directly from joint states"
  • Free-space torque: The torque needed to execute a motion trajectory in the absence of external contact forces. "the torque required to realize the observed motion in free space"
  • Friction compensation: Control techniques that counteract friction effects in actuators/transmissions to improve motion/force tracking. "along with gravity compensation, friction compensation, and null-space regulation."
  • Generalized momentum: The product of the mass matrix and joint velocities used in dynamic models, whose residuals can reveal external disturbances. "the observer estimates external torque from the residual in generalized momentum,"
  • Gravity compensation: Control strategies that offset gravitational torques so the controller need not fight gravity during motion. "along with gravity compensation, friction compensation, and null-space regulation."
  • Gravity term: The component of robot dynamics accounting for torques due to gravity acting on the links. "gravity term g\mathbf{g}"
  • Hybrid force-position control: A control scheme that simultaneously regulates motion and force, often used in contact tasks. "A second line of work combines behavior cloning with hybrid force-position or impedance control at inference time"
  • Hysteresis: History-dependent behavior in actuators/transmissions where response depends on past states, causing nonlinearities. "including nonlinear friction, stiction, backlash, hysteresis, temperature-dependent drive behavior, sensing noise, torque ripple, deadzones, and saturation"
  • Hysteresis thresholding: A thresholding method with separate on/off thresholds to reduce spurious switching due to noise. "To avoid noisy transitions, we use hysteresis thresholding"
  • Impedance control: A control approach that regulates the dynamic relationship between motion and force, shaping apparent stiffness/damping during contact. "hybrid force-position or impedance control at inference time"
  • Inverse dynamics: Computing the required joint torques from desired motion (positions/velocities/accelerations) and a dynamics model. "Instead of computing free-space torque τf\tau_f from model-based inverse dynamics"
  • Joint torque sensors: Sensors that directly measure the torque at robot joints, enabling accurate force estimation. "yet achieves estimates comparable to dedicated joint-torque sensors."
  • Leader-follower position-position feedback: A teleoperation mode where leader and follower positions are compared and fed back as torques to the operator. "leader-follower position-position feedback \cite{bilateral_survey} (PP)"
  • Magnetic tactile sensors: Touch/force sensors that infer contact via magnetic field changes, often using embedded magnets and Hall sensors. "magnetic tactile sensors such as ReSkin \cite{bhirangi2021reskin}"
  • Mass matrix: The configuration-dependent inertia matrix of a robot that maps accelerations to torques. "robot mass matrix M\mathbf{M}"
  • Mixture-of-experts architecture: A model that routes inputs to specialized expert subnetworks based on context. "ForceVLA~\cite{yu2025forcevla} further introduces a mixture-of-experts architecture for context-aware routing across modality-specific experts."
  • Momentum-based disturbance observer: A disturbance estimator that relies on the dynamics of generalized momentum rather than acceleration. "As a second baseline, we implement a momentum-based disturbance observer."
  • Motor torque constant: A proportionality constant relating motor current to output torque. "where ImI_m is the measured motor current and KK is the torque constant."
  • MuJoCo inverse dynamics: Using the MuJoCo physics engine to compute model-based inverse dynamics for torque estimation. "Given the measured motor torque τm\tau_m, external torque is estimated by subtracting the nominal free-space torque computed using MuJoCo inverse dynamics:"
  • Neural External Torque Estimation (NEXT): A data-driven method that learns free-space torque and infers external joint torque as a residual, without force sensors. "We present Neural External Torque Estimation (NEXT), a data-driven method that estimates external joint torques without needing any dedicated force sensors."
  • Null-space regulation: Control actions in the robot’s null space (not affecting the primary task) to improve posture or avoid joint limits. "along with gravity compensation, friction compensation, and null-space regulation."
  • Piezoresistive FT sensors: Force/torque sensors that use strain gauges whose resistance changes under stress to measure forces. "Conventional piezoresistive FT sensors mounted between the arm and end effector rely on multiple strain gauges attached to precision-machined structures"
  • Proprioceptive: Relating to internal robot sensors (e.g., joint positions, velocities, currents) rather than external sensing like cameras. "a history of proprioceptive observations."
  • Saturation: Actuator or sensor limits where outputs can no longer increase proportionally to inputs, causing clipping. "including nonlinear friction, stiction, backlash, hysteresis, temperature-dependent drive behavior, sensing noise, torque ripple, deadzones, and saturation"
  • Sensorless external force estimation: Inferring contact forces/torques without dedicated force sensors, typically via models or learning from internal signals. "Sensorless External Force Estimation"
  • Stiction: Static friction that must be overcome to initiate motion, leading to stick-slip effects. "including nonlinear friction, stiction, backlash, hysteresis, temperature-dependent drive behavior, sensing noise, torque ripple, deadzones, and saturation"
  • System identification: The process of estimating model parameters (e.g., inertial/friction) from observed data to improve dynamics accuracy. "Traditional approaches based on analytical modeling and system identification \cite{biact, yamane2026, shi2026minimalistcompliancecontrol} can be effective"
  • Teleoperation: Remote control of a robot by a human operator, often with haptic feedback of sensed forces. "NEXT enables force-feedback teleoperation on low-cost arms"
  • Torque ripple: Periodic variations in motor torque output due to electromagnetic or mechanical effects, causing vibrations. "including nonlinear friction, stiction, backlash, hysteresis, temperature-dependent drive behavior, sensing noise, torque ripple, deadzones, and saturation"
  • URDF: Unified Robot Description Format, an XML specification for robot models used by simulation/planning tools. "the dynamics model derived from the manufacturer-provided URDF~\cite{agilexrobotics_agx_arm_urdf}"

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 14 tweets with 547 likes about this paper.