AutoDex: An Automated Real-World System for Dexterous Grasping Data Collection
Abstract: Learning robust dexterous grasping requires real-world data that records the physical outcomes of grasp attempts. Such data is hard to obtain at scale: teleoperation yields valid physical outcomes but is slow and operator-biased, while simulation-based generation is cheap and scalable but cannot certify contact validity. A natural solution is to generate candidate grasps and verify them on real hardware, but this scales only if the entire collection loop (perception, execution, labeling, and reset) runs without human intervention. We present AutoDex, an automated real-world data-collection system that closes this loop: for each candidate from a replaceable generator, it localizes the object under severe hand-object occlusion with dense 20-camera perception, executes collision-monitored robot motions, labels lift-and-hold success or failure, and actively resets the object between trials to expose additional candidates across stable poses. The result is a reusable database of physically labeled grasp trials that downstream systems can query by retrieval and feasibility filtering. Using AutoDex, we collect 3,593 grasp trials across Allegro and Inspire hands on 100 diverse objects, with synchronized multi-view observations and robot-state logs. For a matched 500-trajectory collection, AutoDex requires 10.3 h versus 49.4 h for teleoperation, yielding a 4.8x throughput improvement, and grasps retrieved from the AutoDex-validated database succeed 76% versus 34% for simulation-only validation. Code and data will be publicly released.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about (in simple terms)
This paper is about teaching robot hands (with fingers, not just simple grippers) to pick up many different real-world objects reliably. The authors built a system called AutoDex that can test tons of “grasp ideas” automatically—without a person standing there controlling the robot—and record which ones work and which ones fail. The end result is a big, reusable collection of real robot attempts with clear success/failure labels that other robots can learn from.
What questions were the researchers trying to answer?
- How can we collect lots of real, trustworthy data about multi-finger robot grasps without needing a human to guide every try?
- Can we combine cheap computer-generated grasp ideas with real-world testing to find which grasps actually work on physical objects?
- Will automating the whole loop—seeing the object, moving the robot, deciding success/failure, and resetting the object—be faster and more reliable than having a person teleoperate (remote-control) the robot?
- Does using many cameras help the robot keep track of the object even when the hand blocks the view?
How did they do it?
Think of AutoDex as a self-running “grasp test lab” for a robot hand:
- The robot: A 6-joint robot arm with a multi-finger hand (they tested two: the Allegro and Inspire hands).
- The “eyes”: 20 synchronized cameras around a well-lit workspace.
- The brain loop: A full cycle that runs without humans.
Here’s the cycle in everyday language:
- See (Perception): The cameras figure out the object’s “6D pose” (where it is in 3D and which way it’s facing—like knowing a LEGO brick’s position and rotation). Because the hand can cover the object during a grasp, using many cameras helps keep track from other angles.
- Plan and Move (Execution): A computer method first generates many candidate grasps (like possible ways to hold the object) in a simulator. AutoDex picks feasible ones (arm can reach them without bumping into things) and moves the robot hand to try them. A safety checker watches the arm’s “effort signals” to stop if there’s an unexpected bump.
- Judge (Labeling): After the hand grabs the object, the robot lifts and holds it. If the object stays at least 5 cm up for 3 seconds, that attempt counts as a success. If it slips or drops, it’s a failure. The system records the robot’s motions, the camera views, and the success/failure label.
- Reset (Set up the next try): To keep testing new grasps, the object often needs to be placed in a different resting position (a “stable pose”). The robot reorients and places the object itself—sometimes even releasing it slightly above the table so it lands correctly—so the next round can start immediately.
Why not just use a simulator? Simulations are fast, but they can’t perfectly predict real-world contact: friction, tiny slips, squishy finger pads, and small force differences can make a grasp fail even if it looks good on a computer. AutoDex solves this by automatically testing candidates on real hardware.
What did they find?
- It’s much faster than human teleoperation: In a matched test of 500 grasp trials, AutoDex finished in 10.3 hours. A human operator doing the same in the same setup took 49.4 hours. That’s about a 4.8× speed-up. The big win is not faster single moves; it’s removing human idle time so the robot can run unattended.
- Real-world validation greatly improves grasp quality: When they collected a database of grasps that had actually been tested on the real robot and succeeded, those grasps worked 76% of the time in new real scenes. If they skipped the real-world testing step and only used candidates filtered by a simulator, success dropped to 34%. In short: testing on the real robot filters out “looks-good-in-sim-but-fails-in-reality” grasps.
- Resetting the object matters: Simply dropping an object and hoping it lands in the right pose often doesn’t work. AutoDex’s active placement strategy can reliably switch the object to poses that passive dropping almost never reaches, and it avoids unsafe “throwing the object” failures.
- More cameras = more reliable tracking: With only a couple of cameras, the system sometimes mis-estimates the object’s pose—especially when the hand blocks the view. As they add more cameras, the pose estimates get much more stable, which improves the whole process.
- A large, reusable dataset: AutoDex collected a large number of real grasp attempts (across 100 different household objects of many shapes and materials), each with synchronized multi-view videos and robot motion data, all labeled as success or failure. Later, a robot can simply “retrieve” successful grasps from this database for a new scene, check they’re reachable and collision-free, and try them—no extra training needed.
Why this matters
- Better training fuel for robot hands: Multi-finger grasping is hard because many small physical details affect success. AutoDex provides the kind of real, labeled data that learning methods need to get robust.
- Scales up without burning people out: Since the whole loop runs by itself, labs or companies can collect many more trials in the same time, across more objects and scenes.
- More reliable robots in the real world: By combining computer-generated ideas with real-world testing, AutoDex closes the gap between “works in simulation” and “works in your kitchen,” helping robots handle everyday objects more confidently.
- Practical reuse: The resulting database can be used like a library of proven grasps. A robot in a new environment can fetch a successful grasp for a known object, check it fits the new scene, and execute it—saving time and avoiding lots of trial and error.
In short, AutoDex is a fully automated, camera-rich, safe, and scalable way to test and label real robot hand grasps. It collects better data faster than human teleoperation and turns those results into a grasp library that helps robots succeed more often in the real world.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a concise, actionable list of what the paper leaves unresolved or insufficiently explored.
- Dataset scale is under-reported: the exact number of collected grasp trials (appears as “,” in multiple places) is missing, preventing reproducibility and benchmarking.
- Label accuracy is not quantified: no analysis of false positives/negatives in the lift-and-hold success criterion (5 cm lift, 3 s hold) or cross-checking with manual verification.
- Failure-mode taxonomy is absent: labels are binary (success/failure) without categorizing reasons (e.g., slip, insufficient friction, finger placement error, kinematic reach issues), limiting diagnostic value for learning.
- Pose-tracking reliability during heavy occlusion is not measured: while dense cameras are used, there is no quantitative assessment of tracking failures under extreme hand–object occlusion or fast motions.
- Camera-density generalization is unclear: the system depends on a 20-camera rig; systematic evaluation of performance with fewer cameras (e.g., 4–8) in the main workcell is limited to ADD-S vs. k and does not report grasp validation success rates or label error rates at lower k.
- Illumination and background robustness are not addressed: data are collected in a controlled LED-lit cell; performance under varied lighting, backgrounds, or outdoor conditions remains unknown.
- Object types are constrained: articulated, soft/deformable, flexible, transparent, reflective, or liquid-containing objects are not explicitly tested; methods to handle these remain open.
- Scene diversity is limited: evaluation focuses on tabletop, wall, and clutter; tight, dynamic, or highly constrained environments (drawers, cabinets, pockets, tool racks) are not studied.
- Upstream generator coverage and bias are unquantified: database contents depend on BODex; missing grasps (e.g., requiring dynamic finger motion, finger-rolling, in-hand manipulation) are acknowledged but not measured.
- Cross-generator comparison is missing: how AutoDex outcomes vary with different synthesis methods (optimization vs. learned generative models) is not evaluated.
- Dynamic/contact-rich strategies are out of scope: finger-rolling regrasps, compliant or tactile-based adjustments, functional grasps (tool use, pouring), and bimanual coordination are not supported.
- Closed-loop control is absent: execution appears open-loop with pre-planned motions; the benefits of integrating online visual/tactile corrections during approach and grasp are unexplored.
- Tactile sensing is not used: no fingertip tactile/force sensing to detect slip or contact quality; potential improvements from tactile integration are not investigated.
- Arm-level collision monitoring may miss local contacts: residual torque monitoring focuses on shoulder joints (J1, J2); detection coverage for lateral/hand-level collisions and small contacts is unreported.
- Collision monitor precision/recall is unmeasured: conservative aborts are reported, but missed collisions (false negatives), near-miss rates, and threshold sensitivity are not analyzed.
- Reset strategy selection is heuristic: choosing next stable pose with remaining candidates is not optimized; scheduling to maximize throughput or coverage remains an open planning problem.
- Reset robustness for large/heavy or fragile objects is unclear: success rates across mass, size, fragility, and tall/thin geometries are not reported; recovery from dropped or ejected objects relies on human intervention in baselines.
- Placement height relaxation (virtual pillars) lacks formal safety guarantees: potential unintended contacts or pose perturbations post-release are not rigorously analyzed; trade-offs of release height h are not quantified beyond feasibility.
- Perception pipeline’s dependence on RGB-only cues is a risk: transparent/reflective objects and textureless surfaces challenge silhouette refinement; the benefits of adding depth or NIR are not evaluated.
- Object modeling for unknown items is not detailed: the paper assumes object models; procedures to create models on-the-fly (scanning/reconstruction, symmetries) and their impact on grasp validation are not provided.
- Cross-hand transfer is not studied: how grasps validated with Allegro transfer to Inspire (and vice versa) is not analyzed; joint configuration mapping and differences in contact mechanics remain open.
- Cross-robot/environment transfer is limited: in-the-wild execution uses four cameras with depth estimates; robustness to calibration drift, different kinematics, and workspace geometries is not quantified.
- Database retrieval does not consider grasp diversity or ranking: no policies to select among multiple successful grasps (by stability margin, approach direction, contact region coverage, or task context).
- Throughput scaling via parallelization is unaddressed: multi-workcell operation, automated object swapping, and maintenance overhead for sustained multi-day runs are not discussed.
- Generalization to unseen objects is not covered: retrieval assumes known objects with validated grasps; strategies for novel-object adaptation (shape similarity, category-level grasps, on-robot rapid validation) are open.
- Data annotations are limited: absence of contact point/patch, friction estimates, force distribution, or compliance metadata restricts usefulness for physics-aware learning.
- Evaluation breadth is narrow: success is reported over 20 objects/515 trials; statistical confidence intervals, per-object breakdowns, and long-tail behavior analyses are missing.
- Robustness to disturbances is not tested: performance under external perturbations (vibration, pushes, varying surface friction, wet/dirty surfaces) is unmeasured.
- Labeling latency vs. online control trade-offs are not explored: success labeling is post hoc; potential performance gains from online tracking-driven termination or correction are unquantified.
- Calibration stability over time is not assessed: sub-millimeter pose consistency is reported per session; drift across long runs, auto-recalibration, and self-check routines are not evaluated.
- Safety under unexpected human/object intrusion is not characterized: automatic safeguards for unmodeled obstacles or human presence in the workcell are not detailed beyond residual torque monitoring.
Practical Applications
Immediate Applications
Below are concrete use cases that can be deployed now in controlled environments (e.g., benchtop workcells) using the paper’s system and dataset, along with sector links, enabling tools/workflows, and key assumptions.
- [Robotics/Manufacturing] High-throughput validation of dexterous grasp candidates for irregular products
- What: Use AutoDex to automatically execute and label thousands of multi-finger grasp trials on SKU-specific parts, tools, or consumer items to curate a “known-good” grasp library per item.
- Tools/Workflow: 20-camera workcell;
BODexcandidate generation; AutoDex execution + lift-and-hold labeling; residual-torque safety; reset planner; export to a grasp database. - Value: 4.8× faster than teleoperation; improves real-world success of retrieved grasps from 34% to 76%.
- Assumptions/Dependencies: Access to calibrated multi-camera rig and a 6-DoF arm with a dexterous hand (e.g., Allegro/Inspire); mesh models or sufficient visual features; tabletop scenes.
- [E-commerce/Warehousing] Rapid SKU onboarding for dexterous pick-and-place
- What: For new SKUs (irregular packaging, soft containers), collect validated grasps and deploy a retrieval-based executor in a 4-camera production cell.
- Tools/Workflow: AutoDex database creation; “in-the-wild” retrieval pipeline (4-camera pose estimation, obstacle checks, motion planning).
- Value: Faster SKU ramp-up; higher first-pass pick success without training policies.
- Assumptions/Dependencies: Static or semi-static scenes; known object poses/stable states; candidate generator coverage of relevant approach directions.
- [Quality Assurance/Gripper R&D] Benchmarking grippers and finger materials under real contact physics
- What: Compare finger-pad materials, hand kinematics, or controller variants by running controlled AutoDex batches across the same object set.
- Tools/Workflow: Swap hand/finger assemblies; reuse the automated loop and labeling; analyze success vs. material/weight categories.
- Value: Real contact outcomes (slip/compliance effects) that simulations miss; reproducible comparisons.
- Assumptions/Dependencies: Consistent calibration; identical execution and reset protocols across trials.
- [Academia] Curriculum-ready dataset and replicable workcell for dexterous manipulation research
- What: Use the released code and multi-view, physically labeled dataset for training, benchmarking, and ablation studies (e.g., sim-to-real, pose tracking, planning).
- Tools/Workflow: Public AutoDex dataset (success/failure labels, 20-view RGB, robot states); baseline retrieval executor; research on perception density vs. reliability.
- Value: Reduces barrier to entry; enables robust evaluation beyond simulation-only metrics.
- Assumptions/Dependencies: Dataset license and object library access; compute/storage for multi-view video; lab safety policies.
- [Software/Sim-to-Real] Sanity-filtering and calibration of simulated grasp generators
- What: Use AutoDex as a physical-validation backend for simulation pipelines, pruning candidates that are geometrically feasible but fail under real dynamics.
- Tools/Workflow: Batch candidate generation → AutoDex validation → update generator priors/weights; closed-loop “data engine.”
- Value: Empirically tightens sim-to-real gap; improves generator precision on real tasks.
- Assumptions/Dependencies: Integration adapters between generator and AutoDex; consistent object models across sim/real.
- [Robotics Integrators] “Data-collection-as-a-service” offering for clients needing dexterous grasps
- What: Build and operate an AutoDex workcell; deliver object-specific validated grasp libraries, execution configs, and safety envelopes for client deployments.
- Tools/Workflow: Turnkey workcell; remote operation; standardized data exports (grasps, trajectories, labels).
- Value: Outsources complex data collection; shortens time-to-deploy for dexterous applications.
- Assumptions/Dependencies: Service viability depends on throughput, uptime, and part logistics; NDA/IP handling for client objects.
- [Safety/Operations Policy in Labs] Practical safety monitoring for unattended robot data collection
- What: Adopt the learned residual-torque collision detector for downwards/contact-prone motions; mandate upward-only recovery trajectories on abort.
- Tools/Workflow: Train MLP nominal-torque model; enable during approach/placement; thresholds and sustained-window checks; logging.
- Value: Prevents damage in unattended runs; compatible with diverse end-effectors where OEM collision settings are unreliable.
- Assumptions/Dependencies: Per-assembly calibration and training data; periodic re-baselining for drift.
- [Education/Makerspaces] Low-vision (≤8 cameras) variant for coursework and demos
- What: Deploy a reduced-camera rig (8–12 cameras) to approach the robustness of the 20-camera setup while cutting cost and complexity.
- Tools/Workflow: Same pose-estimation and silhouette refinement pipeline; monitor ADD-S error vs. camera count; selective occlusion-aware placements.
- Value: Accessible teaching and prototyping environment; still benefits from multi-view redundancy.
- Assumptions/Dependencies: Occlusion-sensitive objects may still require more views; careful calibration remains essential.
Long-Term Applications
The following use cases require further research, scaling, or engineering beyond the current workcell constraints (e.g., fewer cameras, mobile platforms, broader tasks).
- [Home/Service Robotics] Household manipulation with personalized, validated grasp libraries
- What: In-home robots continuously collect and validate grasps for user-owned objects, updating a private, personalized library for reliable daily assistance.
- Tools/Workflow: On-device AutoDex-like loop with sparse cameras (or RGB-D + tactile), periodic offline validation, retrieval-first execution; continual learning.
- Dependencies/Assumptions: Robust pose estimation with minimal sensors; safe on-device data collection near people; handling deformable/transparent items; privacy.
- [Healthcare/Assistive Robotics] Reliable ADL (Activities of Daily Living) grasps for assistive hands
- What: Build validated libraries for utensils, medication bottles, grooming items; customize for user-specific handovers, orientations, and safety constraints.
- Tools/Workflow: Task-conditioned grasp validation (beyond lift-and-hold); human-in-the-loop preference constraints; tactile safety monitors.
- Dependencies/Assumptions: Clinical safety certification; compliant and fail-safe hardware; person-aware perception and control.
- [Mobile Manipulation/Field Service] On-site validation for maintenance, inspection, and logistics
- What: Robots gather task-specific grasp data in situ (warehouses, factories, retail), gradually replacing lab-collected libraries with environment-specific validation.
- Tools/Workflow: Self-calibrating perception on sparse/moving sensors; environmental reconstruction; distributed grasp-data syncing; RaaS fleets.
- Dependencies/Assumptions: Robust tracking under variable lighting/occlusion; dynamic obstacle handling; policy and insurance frameworks for unattended operation.
- [Advanced Dexterity] Functional grasps and non-prehensile maneuvers (tool use, handovers, finger-gaiting)
- What: Extend the loop to validate task success beyond lift-and-hold (e.g., pouring, turning knobs, inserting connectors), including dynamic regrasps.
- Tools/Workflow: Task-specific success metrics and sensors (force/torque, tactile), richer action spaces, multi-stage planning and labeling.
- Dependencies/Assumptions: High-fidelity sensing of contact events; generalized reset strategies; expanded candidate generators for dynamic actions.
- [Bimanual/Collaborative Manipulation] Coordinated grasp libraries and resets for two arms/hands
- What: Validate cooperative grasps (stabilize with one hand, act with the other) and robust inter-arm transfer and placement resets.
- Tools/Workflow: Dual-arm planning/monitoring; multi-object pose tracking; synchronized residual-torque safety; shared grasp databases.
- Dependencies/Assumptions: Increased calibration complexity; collision-free coordination; safe failure modes.
- [Perception-Light Systems] Camera-minimal (2–4 cameras) or vision+tactile systems achieving 20-camera reliability
- What: Replace dense multi-view with learned priors, tactile servoing, or active perception to eliminate catastrophic pose failures.
- Tools/Workflow: Tactile localization, contact-rich SLAM, object-specific priors; uncertainty-aware planning with active viewpoint control.
- Dependencies/Assumptions: Reliable tactile hardware; robust fusion of sparse vision and touch; on-line uncertainty estimates.
- [Standardization/Policy] Reporting and safety standards for physically validated manipulation datasets
- What: Institutionalize best practices: publish physical outcome labels, success criteria, reset procedures, safety triggers, and pose-calibration metrics with datasets.
- Tools/Workflow: Community benchmarks; data cards detailing collection conditions (camera count, ADD-S, labeling rules); conformity tests.
- Dependencies/Assumptions: Community buy-in; publisher and funding-agency requirements; interoperability with ROS/industrial formats.
- [Products and Services] Commercial offerings built on AutoDex-style capabilities
- What:
- “AutoDex Kit”: packaged multi-camera workcell with software stack (perception, safety, reset).
- “Validated Grasp Library” subscriptions for common object categories (kitchenware, tools, retail items).
- “Reset Planner” and “Residual-Torque Monitor” as ROS 2 packages or OEM firmware plugins.
- Cloud “Grasp Validation Service” with logistics for object shipment and data return.
- Dependencies/Assumptions: Hardware vendor partnerships; support for diverse hands/arms; SLAs for throughput and failure recovery; data/IP governance.
- [AI Training Pipelines] Large-scale real-robot data engines for dexterous policy learning
- What: Use physically labeled grasp outcomes (success/failure trajectories) to train generalist manipulation policies, leveraging both imitation and offline RL.
- Tools/Workflow: Continuous AutoDex-style data harvesting; balanced sampling of successes/failures; policy evaluation with retrieval baselines.
- Dependencies/Assumptions: Scalable storage/compute; data quality control; coverage across objects, materials, and scenes; alignment between lift-and-hold and end-task objectives.
- [Circular Economy/Recycling] Sorting and disassembly of irregular, mixed-material items
- What: Validate robust grasps for variable, damaged, or composite objects to improve throughput in sorting lines or disassembly stations.
- Tools/Workflow: On-line object library growth; retrieval with obstacle-aware planning; frequent reset between stable poses to expose new grasp affordances.
- Dependencies/Assumptions: Handling of dirt/damage-induced appearance change; occlusion-heavy scenes; unknown or approximate object models.
These applications extend directly from the paper’s contributions: a fully automated real-world validation loop (perception → execution → labeling → reset), a physically labeled multi-view dataset, a robust safety monitor, and a retrieval-based deployment path. Feasibility hinges on matching sensing density and calibration quality to task difficulty, ensuring candidate generators cover relevant contact modes, and adopting safety and reset strategies that tolerate unattended operation.
Glossary
- 6D pose: A six-degree representation of an object’s position and orientation in 3D space (3D translation + 3D rotation). "estimates the object's initial 6D pose with dense 20-camera perception"
- ADD-S: A 6D pose accuracy metric for symmetric objects (Average Distance of Model Points—Symmetric). "Mean ADD-S decreases from 14.3 mm at to 0.5 mm at "
- Allegro Hand: A 16-DoF anthropomorphic robotic hand used for dexterous manipulation. "we use either a 16-DoF Allegro Hand or a 6-DoF Inspire Hand"
- BODex: An optimization-based dexterous grasp synthesis method used to generate grasp candidates. "we use BODex~\cite{chen2024bodex} as the candidate generator"
- bundle adjustment: A nonlinear optimization that jointly refines camera parameters and 3D structure for multi-view calibration. "Per-session extrinsics are recovered by global bundle adjustment (COLMAP)"
- ChArUco board: A calibration target combining chessboard corners and ArUco markers for accurate camera intrinsic calibration. "Camera intrinsics are calibrated before mounting using a ChArUco board"
- COLMAP: A structure-from-motion tool used for recovering camera poses and calibration in multi-view setups. "per-session extrinsics are recovered with COLMAP and hand-eye calibration"
- cuRobo: A GPU-accelerated motion planning library for robots, used here to pre-screen safe trajectories. "All static and dynamic training trajectories are pre-screened with cuRobo~\cite{sundaralingam2023curobo}"
- Depth Anything 3: A foundation model for estimating depth from images, used to reconstruct obstacle geometry. "reconstruct surrounding geometry from depth estimates using Depth Anything 3~\cite{dav3}"
- DoF: Degrees of Freedom; the number of independent joint variables of a robot or hand. "a 6-DoF xArm"
- domain randomization: A sim-to-real technique that randomizes simulation parameters to improve real-world robustness. "Domain randomization~\cite{tobin2017dr,openai2019rubikscube} can improve robustness"
- end-effector: The robot’s tool at the tip of its kinematic chain (here, the robotic hand). "planning a trajectory back to a predefined home configuration while monotonically increasing the end-effector height"
- FoundPose: An RGB-only foundation-feature 6D pose estimator used for initial object pose hypotheses. "FoundPose~\cite{ornek2024foundpose}, an RGB-only foundation-feature pose estimator"
- GoTrack: A multi-view object pose tracker used to obtain 6D pose trajectories from recorded streams. "we run multi-view GoTrack~\cite{nguyen2025gotrack} on the recorded stream"
- hand–eye calibration: The calibration that aligns robot (hand) and camera coordinate frames. "and hand-eye calibration, yielding sub-millimeter multi-view pose self-consistency"
- IK (Inverse Kinematics): Computing joint angles that achieve a desired end-effector pose. "IK solvability"
- IoU: Intersection-over-Union, a mask overlap metric used to score pose hypotheses. "scored by the mean IoU between its rendered silhouette and the observed masks"
- lift-and-hold: A physical success criterion requiring lifting an object and holding it for a specified duration. "labels lift-and-hold success or failure"
- MLP: Multilayer Perceptron; a feed-forward neural network used here to predict nominal torques. "the monitor predicts the nominal free-space torque with an MLP and computes the residual:"
- MuJoCo: A physics engine used to simulate and pre-filter grasp candidates. "we screen each candidate in MuJoCo~\cite{todorov2012mujoco}"
- residual-torque monitor: A learned model that detects unexpected contacts by comparing predicted and measured joint torques. "We therefore use a learned residual-torque monitor trained on collision-free motions of the deployed arm--hand assembly."
- SAM3: Segment Anything Model 3; a segmentation model used to predict object masks from RGB images. "we predict object masks with SAM3~\cite{sam3}"
- servo mode: A control mode where joints continuously track commanded positions/velocities along a trajectory. "executes the screened trajectories in servo mode"
- silhouette optimization: Refining object pose by aligning the projected model silhouette to observed masks. "then refined by silhouette optimization"
- Sim-to-Real: The transfer of methods or models developed in simulation to real-world hardware and conditions. "Dexterous Grasping, Autonomous Data Collection, Sim-to-Real"
- stable pose: A static equilibrium orientation of an object on a support surface. "Let denote the set of stable tabletop poses of the object."
- teleoperation: Human control of a robot to perform tasks, often via a remote interface. "Human teleoperation~\cite{liu2024realdex,wang2024dexcap,qin2023anyteleop} produces real contact outcomes"
- xArm: A 6-DoF industrial robotic arm used as the manipulator in the workcell. "a 6-DoF xArm"
- hand–object occlusion: Visual occlusion of the object by the robot hand during manipulation, complicating perception. "tracks the object during execution despite severe hand--object occlusion"
Collections
Sign up for free to add this paper to one or more collections.

