Force-Informed Re-Sampling Training (FIRST)
- The paper introduces FIRST, which uses external torque signals to up-sample underrepresented pre-contact and contact segments in demonstration data.
- FIRST applies systematic phase labeling and hysteresis thresholding to segment trajectories, correcting data imbalance in contact-rich robotic tasks.
- Empirical results show FIRST improves task progress rates by over 17% and reduces validation loss by approximately 30% on low-cost robotic arms.
Force-Informed Re-Sampling Training (FIRST) is an algorithmic framework designed to enhance policy learning for contact-rich robotic manipulation tasks by leveraging estimated external force signals to prioritize failure-prone segments of demonstration trajectories. FIRST was introduced in the context of behavior cloning (BC) and flow-matching-based imitation learning to address the systematic underrepresentation of pre-contact and contact intervals, which are critical yet rare in typical demonstration datasets. By coupling a learned external torque estimator, such as Neural External Torque Estimation (NEXT), with a principled re-sampling strategy, FIRST achieves significant improvements in policy robustness and task completion rates for manipulation on low-cost, sensor-limited robot arms (Oh et al., 10 Jun 2026).
1. Motivation and Problem Setting
Standard BC policies are trained using teleoperated demonstrations in which each timestep is treated equally regardless of task context. In long-horizon, contact-intensive tasks, most trajectory segments consist of free motion, while failures disproportionately occur in brief pre-contact periods—where precise alignment with surfaces or cavities is necessary—and during contact, which often requires finely tuned force control. This data imbalance leads to policy underfitting in these challenging phases and results in brittle performance for alignment and force-sensitive interactions.
FIRST addresses this limitation by using external torque estimates derived from NEXT to automatically annotate demonstrations into failure-prone ("Pre-Contact," "Contact") and routine ("Free") intervals. During policy training, FIRST systematically up-samples the underrepresented but critical segments so that the learned policy sees proportionally more examples of difficult transitions and interactions.
2. Algorithmic Framework and Mathematical Formulation
For each demonstration composed of sensorimotor observations (including proprioception, vision, and from NEXT) and action chunks , FIRST employs the following formal procedure:
a. Contact Score and Hysteresis Thresholding
The contact score at each timestep,
serves as a proxy for physical interaction intensity. Two hysteresis thresholds define a binary contact indicator :
b. Phase Labeling
Contact transition points are used to segment each trajectory. For control-loop frequency (e.g., 100 Hz), phase label is determined: 0
c. Re-Sampling Distribution
Each phase is assigned weights 1, such as 2 for Free, Pre-Contact, and Contact respectively. Sampling probabilities are given by: 3 Up-sampling (increasing 4 relative to 5) ensures higher representation of key failure moments in each mini-batch.
d. Behavior Cloning/Flow-Matching Objective
Policy 6 is trained by minimizing the flow-matching loss: 7 where 8, 9, 0. The loss function remains unchanged; FIRST solely alters the training distribution via 1.
3. Detailed Training Workflow
The FIRST procedure operates as follows:
- Segment demonstrations using NEXT-estimated external torques and hysteresis thresholding to assign each timestep a phase label: Free, Pre-Contact, or Contact.
- Assemble a segmented dataset 2 with phase annotations.
- At each training iteration, sample batches from 3 with the adjusted phase-dependent distribution.
- Train the policy using standard flow-matching or behavior cloning objectives.
The integration with NEXT involves an initial offline phase—training NEXT on 10 minutes of contact-free motion data to predict free-space torque—and a demonstration phase, in which NEXT outputs the external torque estimate 4. FIRST can use 5 both as an input feature to the policy and as the segmentation signal for phase labeling (Oh et al., 10 Jun 2026).
4. Hyperparameter Settings and Ablation Findings
Empirically validated hyperparameters include:
- Control frequency 6 Hz, yielding a pre-contact window of 1 s (100 steps).
- Typical hysteresis thresholds 7 Nm, 8 Nm.
- Default sampling weights: 9.
- Batch size: 128; epochs: 15–20; optimizer: AdamW, 0, weight decay 1.
- Action chunk length 2.
Ablation studies demonstrate that prioritizing Pre-Contact up-sampling (3) yields greater performance improvements than Contact-only up-sampling (4). Joint up-sampling (5) is beneficial for tasks involving sustained interaction. Performance gains saturate at an up-sampling factor of approximately 5; larger factors reduce generalization by overfitting to specific segments.
5. Empirical Evaluation and Comparative Performance
FIRST was evaluated on five high-difficulty, contact-rich bimanual manipulation tasks using the Piper setup, with 250 teleoperated demonstrations and 20 evaluation rollouts per task:
- LEGO Assembly
- NIST Belt Assembly
- NIST Insertion
- Tool Clean Up
- Cap Screwing
The main metric is Task Progress Rate (fraction of subtasks completed). Comparative baselines use identical policy architectures but different training strategies:
| Baseline | Description |
|---|---|
| Base | Vision + proprioception only |
| Base + Torque | Base plus NEXT torque estimate input |
| FACTR | Image blur curriculum + torque input |
| TA-VLA | Auxiliary torque reconstruction loss |
| FIRST | Torque input + Pre-Contact/Contact up-sampling |
Across all five tasks, FIRST increased average task progress from approximately 0.67 (best prior method) to approximately 0.82—a relative gain of over 17%. FIRST also reduced pre-contact and contact phase validation loss by approximately 30% (Oh et al., 10 Jun 2026).
6. Integration Practices and Practical Guidelines
FIRST requires only the following integration steps once NEXT or any comparable force estimator is available:
- Segment trajectories using 6 and the phase-labeling procedure.
- Implement phase-based re-sampling when constructing mini-batches.
Up-sampling the Pre-Contact window (about 1 s prior to contact) reliably yields performance improvements of at least 10%. For tasks demanding prolonged contact control (e.g., screwing, wrapping), up-sampling the Contact phase is advisable. Up-sampling factors should be set to moderate values (e.g., 3–7); excessive weighting can induce overfitting to narrow sub-trajectories.
Even when external torque estimates cannot be provided as a policy input, FIRST can still be applied offline for dataset re-balancing; empirical results indicate 5–10% gains are observed in such scenarios. Although introduced in conjunction with flow-matching, the re-sampling framework is agnostic to the choice of imitation loss and is applicable to behavior cloning and other objectives (Oh et al., 10 Jun 2026).
7. Significance and Broader Context
FIRST exemplifies a data-centric approach to address imbalances in demonstration datasets for robotic imitation learning, particularly for under-instrumented platforms lacking direct force sensors. By harnessing learned external torque signals for both state characterization and dataset balancing, FIRST targets the most failure-prone moments of manipulation tasks—yielding substantial improvements in policy efficacy with minimal methodological overhead. The generality of the phase-based re-sampling principle suggests applicability across a wide range of contact-intensive robotic domains and imitation learning paradigms.