Papers
Topics
Authors
Recent
Search
2000 character limit reached

Force-Informed Re-Sampling Training (FIRST)

Updated 12 June 2026
  • The paper introduces FIRST, which uses external torque signals to up-sample underrepresented pre-contact and contact segments in demonstration data.
  • FIRST applies systematic phase labeling and hysteresis thresholding to segment trajectories, correcting data imbalance in contact-rich robotic tasks.
  • Empirical results show FIRST improves task progress rates by over 17% and reduces validation loss by approximately 30% on low-cost robotic arms.

Force-Informed Re-Sampling Training (FIRST) is an algorithmic framework designed to enhance policy learning for contact-rich robotic manipulation tasks by leveraging estimated external force signals to prioritize failure-prone segments of demonstration trajectories. FIRST was introduced in the context of behavior cloning (BC) and flow-matching-based imitation learning to address the systematic underrepresentation of pre-contact and contact intervals, which are critical yet rare in typical demonstration datasets. By coupling a learned external torque estimator, such as Neural External Torque Estimation (NEXT), with a principled re-sampling strategy, FIRST achieves significant improvements in policy robustness and task completion rates for manipulation on low-cost, sensor-limited robot arms (Oh et al., 10 Jun 2026).

1. Motivation and Problem Setting

Standard BC policies are trained using teleoperated demonstrations in which each timestep is treated equally regardless of task context. In long-horizon, contact-intensive tasks, most trajectory segments consist of free motion, while failures disproportionately occur in brief pre-contact periods—where precise alignment with surfaces or cavities is necessary—and during contact, which often requires finely tuned force control. This data imbalance leads to policy underfitting in these challenging phases and results in brittle performance for alignment and force-sensitive interactions.

FIRST addresses this limitation by using external torque estimates derived from NEXT to automatically annotate demonstrations into failure-prone ("Pre-Contact," "Contact") and routine ("Free") intervals. During policy training, FIRST systematically up-samples the underrepresented but critical segments so that the learned policy sees proportionally more examples of difficult transitions and interactions.

2. Algorithmic Framework and Mathematical Formulation

For each demonstration composed of sensorimotor observations oto_t (including proprioception, vision, and Ï„^ext,t\hat{\tau}_{\mathrm{ext},t} from NEXT) and action chunks at:t+ka_{t:t+k}, FIRST employs the following formal procedure:

a. Contact Score and Hysteresis Thresholding

The contact score at each timestep,

ft=∥τ^ext,t∥1,f_t = \|\hat{\tau}_{\mathrm{ext},t}\|_1,

serves as a proxy for physical interaction intensity. Two hysteresis thresholds Tlow<ThighT_{\mathrm{low}} < T_{\mathrm{high}} define a binary contact indicator ctc_t: ct={1if ft≥Thigh, 0if ft≤Tlow, ct−1otherwise.c_t = \begin{cases} 1 & \text{if } f_t \geq T_{\mathrm{high}}, \ 0 & \text{if } f_t \leq T_{\mathrm{low}}, \ c_{t-1} & \text{otherwise.} \end{cases}

b. Phase Labeling

Contact transition points Tonset={t∣ct−1=0, ct=1}\mathcal{T}_{\mathrm{onset}} = \{t \mid c_{t-1}=0,\, c_t=1\} are used to segment each trajectory. For control-loop frequency FF (e.g., 100 Hz), phase label sts_t is determined: τ^ext,t\hat{\tau}_{\mathrm{ext},t}0

c. Re-Sampling Distribution

Each phase is assigned weights Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}1, such as Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}2 for Free, Pre-Contact, and Contact respectively. Sampling probabilities are given by: Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}3 Up-sampling (increasing Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}4 relative to Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}5) ensures higher representation of key failure moments in each mini-batch.

d. Behavior Cloning/Flow-Matching Objective

Policy Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}6 is trained by minimizing the flow-matching loss: Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}7 where Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}8, Ï„^ext,t\hat{\tau}_{\mathrm{ext},t}9, at:t+ka_{t:t+k}0. The loss function remains unchanged; FIRST solely alters the training distribution via at:t+ka_{t:t+k}1.

3. Detailed Training Workflow

The FIRST procedure operates as follows:

  1. Segment demonstrations using NEXT-estimated external torques and hysteresis thresholding to assign each timestep a phase label: Free, Pre-Contact, or Contact.
  2. Assemble a segmented dataset at:t+ka_{t:t+k}2 with phase annotations.
  3. At each training iteration, sample batches from at:t+ka_{t:t+k}3 with the adjusted phase-dependent distribution.
  4. Train the policy using standard flow-matching or behavior cloning objectives.

The integration with NEXT involves an initial offline phase—training NEXT on 10 minutes of contact-free motion data to predict free-space torque—and a demonstration phase, in which NEXT outputs the external torque estimate at:t+ka_{t:t+k}4. FIRST can use at:t+ka_{t:t+k}5 both as an input feature to the policy and as the segmentation signal for phase labeling (Oh et al., 10 Jun 2026).

4. Hyperparameter Settings and Ablation Findings

Empirically validated hyperparameters include:

  • Control frequency at:t+ka_{t:t+k}6 Hz, yielding a pre-contact window of 1 s (100 steps).
  • Typical hysteresis thresholds at:t+ka_{t:t+k}7 Nm, at:t+ka_{t:t+k}8 Nm.
  • Default sampling weights: at:t+ka_{t:t+k}9.
  • Batch size: 128; epochs: 15–20; optimizer: AdamW, ft=∥τ^ext,t∥1,f_t = \|\hat{\tau}_{\mathrm{ext},t}\|_1,0, weight decay ft=∥τ^ext,t∥1,f_t = \|\hat{\tau}_{\mathrm{ext},t}\|_1,1.
  • Action chunk length ft=∥τ^ext,t∥1,f_t = \|\hat{\tau}_{\mathrm{ext},t}\|_1,2.

Ablation studies demonstrate that prioritizing Pre-Contact up-sampling (ft=∥τ^ext,t∥1,f_t = \|\hat{\tau}_{\mathrm{ext},t}\|_1,3) yields greater performance improvements than Contact-only up-sampling (ft=∥τ^ext,t∥1,f_t = \|\hat{\tau}_{\mathrm{ext},t}\|_1,4). Joint up-sampling (ft=∥τ^ext,t∥1,f_t = \|\hat{\tau}_{\mathrm{ext},t}\|_1,5) is beneficial for tasks involving sustained interaction. Performance gains saturate at an up-sampling factor of approximately 5; larger factors reduce generalization by overfitting to specific segments.

5. Empirical Evaluation and Comparative Performance

FIRST was evaluated on five high-difficulty, contact-rich bimanual manipulation tasks using the Piper setup, with 250 teleoperated demonstrations and 20 evaluation rollouts per task:

  • LEGO Assembly
  • NIST Belt Assembly
  • NIST Insertion
  • Tool Clean Up
  • Cap Screwing

The main metric is Task Progress Rate (fraction of subtasks completed). Comparative baselines use identical policy architectures but different training strategies:

Baseline Description
Base Vision + proprioception only
Base + Torque Base plus NEXT torque estimate input
FACTR Image blur curriculum + torque input
TA-VLA Auxiliary torque reconstruction loss
FIRST Torque input + Pre-Contact/Contact up-sampling

Across all five tasks, FIRST increased average task progress from approximately 0.67 (best prior method) to approximately 0.82—a relative gain of over 17%. FIRST also reduced pre-contact and contact phase validation loss by approximately 30% (Oh et al., 10 Jun 2026).

6. Integration Practices and Practical Guidelines

FIRST requires only the following integration steps once NEXT or any comparable force estimator is available:

  • Segment trajectories using ft=∥τ^ext,t∥1,f_t = \|\hat{\tau}_{\mathrm{ext},t}\|_1,6 and the phase-labeling procedure.
  • Implement phase-based re-sampling when constructing mini-batches.

Up-sampling the Pre-Contact window (about 1 s prior to contact) reliably yields performance improvements of at least 10%. For tasks demanding prolonged contact control (e.g., screwing, wrapping), up-sampling the Contact phase is advisable. Up-sampling factors should be set to moderate values (e.g., 3–7); excessive weighting can induce overfitting to narrow sub-trajectories.

Even when external torque estimates cannot be provided as a policy input, FIRST can still be applied offline for dataset re-balancing; empirical results indicate 5–10% gains are observed in such scenarios. Although introduced in conjunction with flow-matching, the re-sampling framework is agnostic to the choice of imitation loss and is applicable to behavior cloning and other objectives (Oh et al., 10 Jun 2026).

7. Significance and Broader Context

FIRST exemplifies a data-centric approach to address imbalances in demonstration datasets for robotic imitation learning, particularly for under-instrumented platforms lacking direct force sensors. By harnessing learned external torque signals for both state characterization and dataset balancing, FIRST targets the most failure-prone moments of manipulation tasks—yielding substantial improvements in policy efficacy with minimal methodological overhead. The generality of the phase-based re-sampling principle suggests applicability across a wide range of contact-intensive robotic domains and imitation learning paradigms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Force-Informed Re-Sampling Training (FIRST).