VibeAct: Vibration to Actions for Contact-Rich Reactive Robot Dexterity
Abstract: Dexterous manipulation depends on contact events that are fast, local, and often visually occluded. Piezoelectric microphones offer a compact and high-bandwidth way to sense these interactions, but the resulting vibro-acoustic signals are difficult to simulate faithfully enough for end-to-end sim-to-real policy learning on dexterous robot hands. We propose VibeAct, a framework that bridges real vibrotactile sensing and simulation-based reinforcement learning through a shared physical representation of contact and slip. In the real world, we embed piezoelectric microphones into a dexterous robot hand and collect vibro-acoustic data through teleoperation, then replay the recordings in a calibrated digital clone to automatically label per-finger contact and slip. A tactile estimator learns to predict contact and slip from real microphone waveforms, while manipulation policies are trained in simulation on the same representation computed directly from simulated contacts. This decoupling lets policies exploit rapid tactile feedback without simulating raw audio. Across five contact-rich tasks spanning regrasping, in-hand reorientation, and insertion, VibeAct consistently outperforms a proprioception-and-point-cloud baseline in simulation, with the largest gains on tasks requiring sustained reactive control, where the continuous slip-magnitude channel proves the most informative observation. The learned policies transfer to a physical dexterous hand-arm platform, improving success rates on deployed tasks. Project videos and additional details are at https://vibeact.github.io/.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
VibeAct: Explaining the Paper in Simple Terms
What is this paper about?
This paper is about teaching a robot hand to use “touch” in a smart way so it can handle objects better—especially when it has to feel what’s happening, like when something starts to slip. Instead of using fancy cameras in the fingertips, the robot listens to tiny vibrations with small microphones inside its fingers and turns those sounds into simple, useful signals that help it react quickly.
What questions are the researchers trying to answer?
The researchers asked:
- Can a robot learn to react to touch (like feeling contact and slip) using cheap microphones that “hear” vibrations?
- How can we train the robot safely and quickly in a simulator without needing to perfectly simulate real-world sounds?
- Is there a simple “touch summary” (contact and slip information) that works both in real life and in simulation?
- Does giving the robot this touch summary help it do tricky tasks like rotating objects in its hand or inserting a peg into a hole?
How did they do it? (Methods explained with everyday ideas)
Think of this as teaching by “listen, translate, and practice”:
- Listen: The robot’s fingertips have small piezoelectric microphones (tiny sensors that turn vibrations into electrical signals). They don’t sit on the surface; they’re inside the finger, like a stethoscope listening through the bone. When the robot’s fingers touch or slide on an object, those interactions make vibrations the microphones can “hear.”
- Translate to a simple touch language: Raw audio is messy and hard to simulate. So the team converts vibration sounds into a small, simple set of signals for each finger:
- Contact onset: “Did I just touch something right now?” (a quick ping)
- Slip presence: “Am I slipping or not?” (yes/no)
- Slip magnitude: “If I’m slipping, how much?” (a number that grows as sliding gets stronger)
This is like turning a whole soundtrack into a few clear indicators: “touched,” “slipping,” and “how slippery.”
- Digital clone for labeling: They had a person teleoperate (remotely control) the robot in the real world while recording the microphone audio and the robot’s movements. Then they replayed those movements in a physics simulator (a “digital clone” of the robot and objects). The simulator can tell exactly when and where fingers touch and slide—so it automatically creates correct labels for contact and slip. No manual labeling needed.
- Train a “tactile estimator”: This is a machine learning model that takes the real microphone audio and predicts the simple touch signals (contact onset, slip yes/no, slip size). It’s like a translator from sound to touch.
- Practice in simulation: They trained the robot’s decision-making program (a “policy”) using reinforcement learning in the simulator. Think of it as the robot practicing in a physics-based video game, learning by trial and error to get better scores (success). The robot’s policy gets three kinds of info:
- Proprioception: its own joint positions (where its fingers are)
- A point cloud: a 3D picture of the scene made of lots of dots from a depth camera
- The simple tactile signals (contact/slip) from the simulator
- In the real world, they replace the simulator’s touch signals with the tactile estimator’s predictions from the microphones.
What did they find, and why does it matter?
Main results:
- The simple touch signals—especially the continuous “how much slip” number—helped the robot succeed much more on contact-heavy tasks.
- In five tasks (like rotating a cube in-hand, inserting a peg in a hole, climbing along an object with finger steps), the robot did better with touch signals than with just its own joint positions and a 3D camera.
- Tasks that need steady, reactive control (like keeping a grip while rotating or aligning a peg) improved the most. The “slip magnitude” channel was the most helpful because it tells the robot not just that sliding is happening, but how strongly—so it can adjust its grip in real time.
- The policies trained in simulation worked on the real robot too. When deployed on hardware, success rates improved compared to not using the touch signals.
Why this matters:
- Robots often can’t see important contact details (fingers block the camera, or events happen too fast). Listening to vibrations gives fast, hidden information.
- Microphones are cheap, small, and fast. This approach lets robots get useful touch feedback without bulky, complex fingertip cameras.
- By using a simple shared “touch language” (contact/slip) that exists both in simulation and in real sensors, the robot can practice safely in simulation and then act well in the real world.
What’s the bigger impact?
- Safer, more reliable robot hands: Robots that can feel slipping can adjust their grip before dropping things—useful for homes, factories, and labs.
- Practical training: Because the robot learns control in a simulator using a simple touch representation, we avoid trying to simulate realistic audio (which is very hard) and avoid collecting tons of risky real-world trial-and-error.
- A general idea for sensors: Using a compact, physically meaningful “intermediate representation” (like contact/slip) can bridge messy real sensors and clean simulators. This idea could apply to other sensing types too.
Notes on limitations (in simple terms):
- The touch summary is simple on purpose. It doesn’t tell the robot exactly where on the finger contact happens or what the surface feels like—just contact and slip. More details could help but are harder to match between the real world and simulation.
- The system depends on how the microphones are installed; moving or changing the hardware may require retraining.
- Creating labels with the “digital clone” needs accurate tracking of objects during data collection, which can be harder in messy, unstructured environments.
Overall takeaway: VibeAct shows that listening to vibrations and converting them into a simple touch language (contact and slip) can make robot hands more reactive and skilled. By training in simulation with that same language and then using a real-world “translator” from audio to touch, the robot gets the best of both worlds—fast learning and real-world success.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
The paper introduces a compelling sim-to-real bridge via a compact contact-and-slip representation, but several aspects remain uncertain or unexplored. The following concrete gaps can guide future research:
- Generalization across hardware and configurations:
- How robust is the tactile estimator to changes in microphone placement, fingertip geometry/materials, adhesive layers, amplifier chain, and different robot hands? Develop protocols for zero-shot or few-shot adaptation when the hardware configuration changes.
- What is the impact of partial sensor failures (e.g., one microphone per finger fails) and how can the estimator be made fault-tolerant?
- Sensitivity to environmental and actuation noise:
- The estimator faces structure-borne vibrations from motors and ambient noise; quantify robustness across different robots, motion speeds, and environmental acoustic/vibration conditions, and evaluate noise-robust training (e.g., source separation, adversarial augmentation).
- Digital-clone labeling fidelity:
- Contact/slip labels depend on precise replay and pose tracking; quantify how mocap/pose-tracking errors, kinematic calibration errors, and timing drift translate into label noise and downstream estimator degradation.
- Assess sensitivity of labels to simulator modeling choices (contact solver, friction model/parameters, restitution), and explore uncertainty-aware labeling or ensemble physics to mitigate model bias.
- Representation design and sufficiency:
- The chosen omits contact location, contact normal, slip direction, and force information. Evaluate whether adding low-dimensional extensions (e.g., slip direction unit vector, rough contact sector on fingertip) yields significant control gains without requiring full audio simulation.
- Current aggregation uses max tangential speed across multiple contacts on a fingertip; investigate alternative aggregations (e.g., weighted by normal force or spatial bins) that preserve multi-contact structure.
- Slip magnitude is clipped at and thresholded at 5 mm/s; perform sensitivity analyses to these hyperparameters and learn them adaptively or per-material.
- Temporal properties and latency:
- The estimator operates on 200 ms windows; measure end-to-end sensing-to-action latency and its effect on control performance, especially for fast transients. Explore causal models with shorter windows or multi-rate fusion to reduce delay.
- Contact onset is modeled as a one-step pulse; study whether temporal encodings (e.g., time-since-contact, contact duration) improve policy stability and performance.
- Policy architecture and use of tactile history:
- The policy treats as a flat vector with MLP encoding; evaluate recurrent/attention-based architectures that exploit the temporal structure of tactile events and multi-finger correlations.
- Contact-onset channels sometimes hurt performance in isolation; investigate event-conditioned control modules or hybrid event–state policies that use onset sparsity more effectively.
- Cross-finger coupling and spatial reasoning:
- Estimator uses independent per-finger subnetworks; test architectures that model cross-finger correlations (e.g., shared temporal backbones or graph neural nets) and evaluate benefits for tasks needing coordinated slip management.
- Data scale and diversity:
- Training uses ~7 hours of teleoperated data; quantify estimator and policy performance as a function of data scale and object/material diversity, and develop scalable collection strategies (self-supervised data mining, active data gathering).
- Generalization to novel objects, surface textures, and coatings (e.g., rough, compliant, or lubricated surfaces) is not characterized; design benchmarks and protocols for out-of-distribution materials.
- Task and domain breadth:
- Extend evaluation beyond rigid objects to deformable or compliant objects, and to dynamic/impact-heavy tasks (e.g., tool use), where vibration content and contact models differ.
- Assess transfer to different hands, more fingers, or whole-arm manipulation where additional contact sites (palms, links) matter.
- Real-world transfer and adaptation:
- Hardware deployment shows moderate gains with sizable residual failure rates; perform systematic failure analysis (misclassification vs. control vs. perception) and investigate real-world fine-tuning (e.g., online RL, residual learning, or policy adaptation with tactile feedback).
- Explore sim-to-real adaptation for the estimator (e.g., domain adversarial training, test-time adaptation) and policies (e.g., dynamics identification, tactile-domain randomization).
- Comparative baselines and upper bounds:
- Provide head-to-head comparisons with alternative tactile modalities (vision-based tactile, magnetic skins, F/T sensors) under matched tasks to contextualize benefits/costs of vibrotactile sensing.
- Establish an upper bound by training policies with privileged tactile signals (e.g., contact location/forces from sim) to quantify the performance gap attributable to the compact representation.
- Directional and richer slip cues:
- Presently only slip magnitude is provided; evaluate if estimating slip direction (tangential vector) or stick–slip oscillation features improves alignment/rotation tasks.
- Investigate combining passive vibro-sensing with active acoustic probing (e.g., micro-taps) for contact localization or material inference without heavy audio simulation.
- Calibration and drift:
- Assess long-term stability and drift of estimator predictions due to sensor aging, temperature, or mechanical wear, and develop online calibration or self-check procedures.
- Multi-rate sensor fusion and control frequency:
- Specify and study the coupling between high-rate audio and lower-rate control loops; design multi-rate observers/controllers that optimally fuse asynchronous tactile, proprioceptive, and visual inputs.
- Simulator–reality discrepancy in friction and slip:
- Examine how differences in real vs. simulated friction/adhesion affect the semantics of “slip” used for labels and control; consider learning a calibrated slip translator or probabilistic slip estimator to handle ambiguous micro-slip regimes.
- Broader safety and efficiency:
- Quantify computational budget and latency of the estimator for embedded deployment; evaluate lightweight models or on-sensor processing.
- Explore safety mechanisms during real-world exploration that exploit tactile cues (e.g., slip-avoidance reflexes) to safely gather additional data.
Practical Applications
Immediate Applications
Below are concrete, near-term uses that can be deployed with modest engineering, leveraging the paper’s estimator, representation, and sim-to-real workflow.
- Robust peg-in-hole, press-fit, and threading on existing lines (Manufacturing, Robotics)
- What: Use slip magnitude and onset to guide alignment and force modulation for insertions and nut/bolt operations; reduce jamming and damage.
- Potential tools/products/workflows: Slip-aware controller module; VibeAct Tactile Estimator + policy running in a ROS node; MuJoCo-based task tuning with domain randomization; retrofittable fingertip microphone kit.
- Assumptions/dependencies: Stable microphone mounting with good structure-borne coupling; on-device inference latency within control loop (~100–200 ms window management); adequate calibration of the digital clone for task tuning; compliance with industrial EMC/noise environments.
- Slip-aware pick-and-place and regrasping for kitting and packaging (Logistics, Manufacturing, Robotics)
- What: Detect early slip to adjust grip and perform in-hand reorientation before placing, reducing drops and rework.
- Potential tools/products/workflows: “SlipGuard” middleware between vision grasp planner and low-level gripper/hand controller; alarms to slow arm speed upon rising slip magnitude.
- Assumptions/dependencies: Point cloud + proprioception available; hand or parallel gripper can modulate grip quickly; estimator is trained on representative SKUs and materials.
- In-hand reorientation for bin picking and singulation (Logistics, Robotics)
- What: Turn, roll, or “walk” objects in hand using graded slip feedback to find stable poses (e.g., label-up orientation).
- Potential tools/products/workflows: Library of regrasp primitives parameterized by slip magnitude thresholds; integration with warehouse picking cells.
- Assumptions/dependencies: Adequate finger compliance/DOF; estimator robustness to diverse object textures; task rewards tuned in sim map to on-floor goals.
- Teleoperation assistance with tactile HUD (Robotics, Remote handling, R&D)
- What: Provide operators real-time indicators of per-finger slip and contact events to avoid drops or overforce in delicate tasks.
- Potential tools/products/workflows: UI overlay showing per-finger slip bars; haptic buzzers mirroring slip onset; simple gating to dampen aggressive teleop commands when slip spikes.
- Assumptions/dependencies: Low-latency streaming of estimator outputs; mapping of slip to intuitive operator cues; environmental audio isolation or estimator gating to suppress airborne sounds.
- QA and process monitoring for contact-rich stations (Manufacturing, Quality)
- What: Log slip signatures and contact onsets as process analytics to detect tool wear, misalignment, or drift.
- Potential tools/products/workflows: “SlipTrace” dashboard aggregating slip magnitude histograms per SKU; SPC limits on abnormal slip spikes; alerts for re-calibration.
- Assumptions/dependencies: Consistent fixturing; versioned estimator/config; privacy controls for any audio capture (structure-borne focus).
- Low-cost tactile retrofit for research and pilot cells (Academia, Startups, Robotics)
- What: Add high-bandwidth tactile sensing to existing hands without changing finger geometry.
- Potential tools/products/workflows: Open-source reference design for fingertip microphone mounts; pre-trained estimators; MuJoCo environments with the contact-and-slip observation API.
- Assumptions/dependencies: Access to teleop or scripted interactions for fine-tuning; calibration of robot–object frames for digital-clone labeling.
- Curriculum and lab modules for tactile RL and sim-to-real (Education, Academia)
- What: Teach tactile sensing, digital-clone labeling, and policy training using the paper’s representation.
- Potential tools/products/workflows: Course labs: collect audio, auto-label via replay, train estimator, train PPO policy in sim, deploy on a classroom hand/arm.
- Assumptions/dependencies: Affordable microphones and audio interface; MuJoCo/ROS toolchains; prepared datasets for classes without hardware.
- Safety-aware force and speed scaling based on slip (Robotics, HRC)
- What: When persistent slip is detected, automatically reduce speed/force to protect parts and tooling.
- Potential tools/products/workflows: Safety wrapper that scales joint velocity or grip force as a function of slip magnitude; watchdog for “no-contact then sudden slip” anomalies.
- Assumptions/dependencies: Certified safety strategy still required; thorough task hazard analysis; verified estimator false-positive/negative rates.
- Tooling for sim-to-real tactile pipelines (Software, Robotics)
- What: Standardize the 12-D contact-and-slip observation channel across simulators and controllers.
- Potential tools/products/workflows: MuJoCo/Isaac plugins that emit z_t; ROS messages/types; evaluation harness for ablating onset vs slip presence vs magnitude.
- Assumptions/dependencies: Simulator provides tangential velocities and contact events; consistent thresholds (e.g., 5 mm/s slip) across stacks.
- Pilot deployments in service/home robots for reliable object handling (Consumer Robotics)
- What: Improve dish/can handling, shelving, and container insertion with slip-based correction on mobile manipulators.
- Potential tools/products/workflows: “Slip-aware grasp” mode in home robots; integration with vision grasping stacks.
- Assumptions/dependencies: Household acoustic noise robustness; compact, sealed fingertip microphone assemblies; productized estimator running on edge compute.
Long-Term Applications
These require further research, scaling, validation, or domain adaptation beyond the current results.
- High-precision electronics and small-part assembly (Manufacturing)
- What: Press-fit connectors, flex-cable insertions, snap fits using micro-slip cues for micron-level alignment.
- Potential tools/products/workflows: Micro-actuated fingertips with high-bandwidth control driven by slip magnitude; multi-modal fusion with force/vision.
- Assumptions/dependencies: Lower-latency sensing (<50 ms effective); estimator calibrated to very light contacts; clean-room compatible sensors.
- Surgical and micro-manipulation slip sensing (Healthcare, Medical Robotics)
- What: Detect micro-slip in tool–tissue interactions to prevent damage and improve suturing/needle handling.
- Potential tools/products/workflows: Sterilizable acoustic transducers integrated in instruments; surgeon feedback via haptics; training in digital twins.
- Assumptions/dependencies: Biocompatibility, sterilization, regulatory approval; validated models of tissue-induced vibrations; extremely low-latency control.
- Prosthetic hands with slip-aware autonomous grip stabilization (Healthcare, Assistive Tech)
- What: Automatically adjust grip when objects start slipping; provide vibro-haptic feedback to users.
- Potential tools/products/workflows: Embedded estimator on low-power MCUs; user-adjustable slip thresholds and feedback patterns.
- Assumptions/dependencies: Efficient on-device inference; robust coupling in soft sockets; individual calibration for users and sockets.
- Deformable object manipulation (cloth/cables/food) guided by slip magnitude (Robotics, Food/Pharma)
- What: Use slip cues to regulate tension and shear during folding, wiring, or handling delicate items.
- Potential tools/products/workflows: Policies that fuse point clouds with tactile slip for deformable state regulation.
- Assumptions/dependencies: New simulators for deformables with reliable slip labeling; richer tactile representations beyond current 12-D vector.
- Autonomous tool use requiring sustained friction control (Robotics, Maintenance/Energy)
- What: Screwdriving, sanding, wiping, valve turning with slip-aware pressure modulation.
- Potential tools/products/workflows: Task libraries with friction setpoint controllers using slip magnitude as feedback.
- Assumptions/dependencies: Robustness to tool-induced vibrations; generalization across tool geometries and materials.
- Standardized tactile representation API and benchmarks (Standards, Academia, Industry consortia)
- What: Cross-platform standard for contact/slip channels, datasets, and evaluation suites.
- Potential tools/products/workflows: Open benchmarks spanning in-hand, insertion, and gaiting tasks; certification tests for tactile estimators.
- Assumptions/dependencies: Community consensus on thresholds/units; shared datasets with synchronized audio and ground truth.
- End-to-end simulation of vibro-acoustics for policy learning (Software, Simulation)
- What: Train on synthetic audio with differentiable or high-fidelity acoustics replacing the estimator.
- Potential tools/products/workflows: Differentiable contact acoustics modules; domain randomization of materials and mountings.
- Assumptions/dependencies: Accurate structural/acoustic models; tractable sim speeds; validated transfer to real microphones.
- Self-calibrating, hardware-agnostic tactile estimators (Robotics, Software)
- What: Estimators that adapt online to new hands, materials, and sensor placements without digital-clone replay.
- Potential tools/products/workflows: Meta-learning or unsupervised domain adaptation on structure-borne audio; auto-tuning slip thresholds.
- Assumptions/dependencies: Sufficient unlabeled interaction data; stable objective functions for online adaptation.
- Privacy- and safety-oriented governance for embedded microphones in robots (Policy, Compliance)
- What: Guidelines ensuring structure-borne focus, on-device filtering, and retention policies to mitigate audio privacy risks.
- Potential tools/products/workflows: Certification checklists; hardware filters that attenuate airborne components; audit logs of estimator outputs instead of raw audio.
- Assumptions/dependencies: Clear regulatory frameworks; demonstrable technical mitigation that microphones are not general-purpose recorders.
- Cross-modal foundation models with vibro-acoustics (Academia, Software)
- What: Joint representations across vision, force, and structure-borne audio for generalist manipulation.
- Potential tools/products/workflows: Pretrained backbones fine-tuned to the contact-and-slip head; data curation pipelines leveraging digital-clone labels.
- Assumptions/dependencies: Large-scale datasets spanning hands, objects, materials; compute budgets; standardized sensors.
- Human–robot collaboration with slip-aware intent and safety cues (Robotics, HRC)
- What: Use slip/contact transients to infer human handoffs, shared grasp adjustments, or unsafe contact.
- Potential tools/products/workflows: HRC controllers that interpret contact onsets as intent signals; safety interlocks tied to unexpected slip patterns.
- Assumptions/dependencies: Reliable discrimination between human-induced and task-induced vibrations; certification for collaborative operation.
- Field maintenance and inspection robots operating under occlusion (Energy, Utilities, Infrastructure)
- What: Manipulate knobs, latches, and connectors in dark/cramped spaces where vision is compromised.
- Potential tools/products/workflows: Tactile-first controllers using slip magnitude to “feel” engagement; digital twins of infrastructure components for training.
- Assumptions/dependencies: Ruggedized, sealed fingertips; estimator robustness to environmental noise and temperature extremes.
Notes on general dependencies across applications:
- The compact tactile representation assumes consistent physical meaning of contact onset, slip presence, and slip magnitude across sim and real; significant hardware or material changes require recalibration or fine-tuning.
- The digital-clone labeling pipeline depends on accurate pose tracking and simulator contact models; errors propagate to estimator supervision.
- Real-time control requires managing the estimator’s windowing latency (e.g., 200 ms windows) and ensuring compute feasibility at the robot edge.
- The representation intentionally omits contact location/forces; tasks needing spatial force distribution may require richer sensing or model extensions.
Glossary
- Ablation study: An experimental analysis where components of a model or system are systematically removed or varied to assess impact. "Ablation studies of the VibeAct tactile estimator."
- Actor and critic heads: The paired output modules in an actor–critic reinforcement learning architecture, where the actor outputs actions and the critic estimates value. "and passed to symmetric actor and critic heads."
- Attention pooling: A neural network mechanism that weights and aggregates features across time or space based on learned attention scores. "Temporal convolutions and attention pooling produce per-microphone embeddings,"
- Binary cross-entropy: A loss function for binary classification measuring the difference between predicted probabilities and true labels. "class-weighted binary cross-entropy losses"
- Contact dynamics: The physical interactions and forces during contact between bodies, modeled in simulation for control and labeling. "computed directly from contact dynamics."
- Contact onset: The instant when contact between surfaces first occurs, often modeled as a brief event. "contact onset is a sparse transient requiring precise temporal alignment,"
- Contact solver: The component of a physics engine that computes contact forces and constraints between colliding bodies. "the simulator's contact solver"
- Dexterous manipulation: Skilled multi-fingered control of objects involving precise, contact-rich interactions. "Dexterous manipulation depends on contact events that are fast, local, and often visually occluded."
- Digital clone: A calibrated simulation replica of the real robot and environment used to replay trajectories and generate labels. "a calibrated MuJoCo digital-clone environment."
- Domain gap: A discrepancy between data distributions or dynamics across settings (e.g., fixed-object vs. in-hand), affecting transfer. "This suggests a large domain gap between fixed-object and in-hand slip."
- Domain randomization: Training-time variation of simulation parameters to improve policy robustness and transfer to reality. "per-episode domain randomization"
- Finger-gaiting: A manipulation strategy where fingers sequentially reposition to move or reorient an object. "finger-gaiting along larger objects."
- Huber loss: A robust regression loss that is quadratic for small errors and linear for large errors, reducing sensitivity to outliers. "Slip magnitude is supervised with a Huber loss"
- LEAP Hand: A specific low-cost, anthropomorphic robotic hand used as the dexterous end-effector in experiments. "an xArm7 and a LEAP hand."
- Log-mel spectrograms: Time–frequency audio representations using mel-scaled frequency bins and logarithmic amplitude. "multi-channel log-mel spectrograms"
- Microphone-gating layer: A learnable module that suppresses noisy sensor channels before feature extraction. "A learnable microphone-gating layer first suppresses noisy channels,"
- Mocap system: A motion capture setup that tracks object poses with cameras calibrated to the robot frame. "or track objects using a mocap system whose cameras are calibrated to the robot base."
- MuJoCo: A physics engine for model-based control and simulation of articulated systems. "in a calibrated MuJoCo digital-clone environment."
- Peg in Hole: A canonical insertion task requiring precise alignment and force control during contact-rich motion. "Peg in Hole starts from a pregrasped cylinder and requires sideways insertion,"
- Piezoelectric microphones: Sensors that convert mechanical vibrations into electrical signals, here used to capture tactile vibrations. "Piezoelectric microphones offer a compact and high-bandwidth way to sense these interactions,"
- Point cloud: A set of 3D points representing scene geometry, often from depth sensors, used as an observation. "a fixed-camera point cloud,"
- PointNet-style: Referring to a neural network architecture for processing unordered point sets with permutation invariance. "A PointNet-style branch"
- PPO policies: Policies trained with Proximal Policy Optimization, a stable on-policy reinforcement learning algorithm. "we train PPO policies"
- Proprioception: Internal sensing of a robot’s joint states and configurations used as part of the observation. "the policy observes proprioception "
- Sensor transfer functions: The frequency-dependent mappings from physical vibrations to measured signals imposed by sensor and electronics. "sensor transfer functions."
- Sim-to-real policy learning: Training policies in simulation with the goal of transferring them to real-world deployment. "sim-to-real policy learning"
- Slip magnitude: A continuous measure of the severity of relative motion (slip) at a contact interface. "the continuous slip-magnitude channel proves the most informative observation."
- Stick-slip motion: Alternating sticking and sliding behavior during frictional contact that generates characteristic vibrations. "stick-slip motion"
- Structure-borne vibrations: Vibrations that propagate through solid structures (e.g., fingers) from contact events. "structure-borne vibrations"
- Tactile estimator: A learned model mapping raw vibro-acoustic signals to a compact tactile representation (e.g., contact and slip). "A tactile estimator learns to predict contact and slip from real microphone waveforms,"
- Tangential relative velocity: The speed of motion parallel to the contact surface between two bodies, used to detect slip. "slip presence is a binary threshold on tangential relative velocity"
- Teleoperation: Human-operated control of a robot to collect demonstrations or data. "During data collection, we teleoperate the hand to interact with objects"
- YCB object: An item from the YCB benchmark object set commonly used for manipulation research. "a held YCB object"
Collections
Sign up for free to add this paper to one or more collections.