LeHome Challenge 2026: Robotics Deformable Object Benchmark
- LeHome Challenge 2026 is a robotics competition focused on advancing deformable object manipulation through realistic household simulation and reproducible evaluation.
- The challenge employs a high-fidelity simulation environment with PBD, FEM, and Eulerian solvers to replicate household scenes and diverse object interactions.
- The winning solution integrates a vision–language–action policy with distributed reinforcement learning to achieve significant improvements in sim-to-real transfer success.
The LeHome Challenge 2026 is a robotics competition established to benchmark and advance algorithms for deformable object manipulation in realistic, simulation-to-real household settings. Built on the LeHome simulation environment, the Challenge emphasizes integrated evaluation of manipulation strategies across a spectrum of object classes and tasks, with a particular focus on deployability on low-cost robots. It provides a standardized testbed supporting high-fidelity physics, multiple robot embodiments, and extensible protocols for reproducibility and fair comparison (Li et al., 24 Apr 2026). Major advances in the 2026 edition center on new policy architectures, distributed RL pipelines, and engineering optimizations for robust sim-to-real transfer (Larchenko, 25 Jun 2026).
1. Simulation Platform and Task Suite
The Challenge is underpinned by LeHome, an integrated system comprising the LeHome Assets library, LeHome Engine, and LeHome Benchmark. The Assets library encodes a wide array of household scenes and object models, including rigid, articulated, and six distinct deformable classes: liquids, gaseous fluids, granular objects, linear objects (cables, ropes), thin shells (cloth, posters, bags), and volumetric objects (e.g., patties, sausages). The Engine utilizes NVIDIA Isaac Sim 4.5.0 (Omniverse) and incorporates three primary solver types: Position-Based Dynamics (PBD) for liquids, granular materials, and thin shells (wrinkling), Finite Element Method (FEM) for rods, elastic shells, and volumetric solids, and Eulerian methods for gaseous phenomena.
The LeHome Benchmark defines six canonical manipulation tasks capturing the diversity of deformable interaction:
| Task | Core Metric | Success Criterion |
|---|---|---|
| Fold Garment | Bounding-box overlap > 90% | Boolean, 100 randomized runs |
| Fling Garment | Max. surface variation < 5 mm | Height-field variance |
| Assemble Burger | Stacked order within 5 cm, stable >2s | Footprint, stability |
| Cut Sausage | Each piece ≥40% original length | Geometric length test |
| Pour Coffee | ≥90% transferred, <5% spilled | PBD particle count |
| Wipe Surface | Residual water <10% of initial | PBD water particle count |
All tasks are formalized via Action Graphs defining state progression and event-driven mesh-state transitions (Li et al., 24 Apr 2026).
2. Physics and Embodiment
LeHome’s physics stack supports varied contact, collision, and elastic phenomena necessary for high-fidelity deformable-object simulation. Rigid objects leverage PhysX, while PBD constrains deformables via geometric and physical constraints (e.g., constant volume, wrinkle formation), and FEM solvers provide accurate shell and volumetric response, including the Kirchhoff–Love and Euler–Bernoulli formulations.
Supported robot embodiments span industrial (UR series, Franka Emika Panda) and the LeRobot family (single, bimanual, and mobile-bimanual, each with low-cost hardware attributes). Control is fully integrated, with teleoperation and motion planners exposed via ROS and custom plugins. The system enforces realistic limits: payload, workspace, torque ceilings, and sensor resolution, ensuring that performance metrics are meaningful for both cutting-edge and resource-constrained systems (Li et al., 24 Apr 2026).
3. Challenge Protocols and Scoring
Benchmark scenarios are constructed with YAML-based scene composition, randomized initializations, and pre-defined goal and success detectors—enabling precise specification of evaluation episodes and reproducibility (fixed random seeds; Dockerized configuration).
Each trial’s result is a binary success, aggregated into overall success rates. Additional metrics, such as completion time, energy consumption (evaluated as sum |τ·ω|), and movement smoothness, are available. Difficulty rankings are assigned by object complexity, planning horizon, and required sensory modalities (e.g., vision only vs. vision+language).
The reproducibility framework includes public containerized environments and open physics parameters. Extensibility recommendations include standardized import of new assets (URDF/OBJ/GLTF), mechanical categorization to auto-select solver pipelines, and plug-in Action Graph nodes for complex transitions such as tearing or tool use (Li et al., 24 Apr 2026).
4. Winning Solution Architecture
The prizewinning solution in 2026 adopts a vision–language–action (VLA) policy whose network simultaneously predicts actions and key outcome measures—success probability (Ŝ_t), completion (Ĉ_t), and future task-relevant values—enabling unified, low-complexity pipeline design (Larchenko, 25 Jun 2026). The architecture consists of:
- A SigLIP-So400m/14 backbone encoding multi-view images.
- A Gemma-2B prefix transformer tokenizing visual and state data, garment-type, and a RECAP-style advantage token.
- Hierarchical attention masking ensures auxiliary prediction heads (success, completion, progress) operate with pixel-only context.
- A flow-matching action expert (Gemma-300M), generating denoised 12-DOF joint-delta chunks ( loss).
- Auxiliary heads for garment-type, keypoint distance, and prospective (‘future-prediction’) signals.
Inference employs best-of-N candidate selection and classifier-free guidance: for each chunk, , selecting the argmax.
5. Reinforcement Learning Pipeline and Optimization Strategies
An asynchronous distributed RL pipeline leverages Hugging Face Hub for state management. Advantage-Weighted Regression (AWR) is used for prioritized sampling, with per-frame sampling probabilities , and debiased linear-head updates. Advantages are blended from success, completion, and segment baselines:
with normalization to . Reward design utilizes binary and checkpoint-shaped signals, with GAE estimates for both success and completion progression. Policy precision is further increased with segment-based boosts and off-policy corrections.
Inference-time hyperparameter tuning is governed by Thompson sampling, operating per garment type over parameters including number of executed steps , playback stretch , anchor length , inpainting onset , guidance scale , noise temperature 0, and candidate pool 1. Optimization is performed by factorized Beta posterior multi-armed bandit, permitting both rapid adaptation and stabilization as policy evolves. Per-garment optimal settings are frozen for final evaluation (Larchenko, 25 Jun 2026).
6. Sim-to-Real Transfer and Empirical Results
Sim-to-real transfer is structured on environment alignment and data diversification:
- Camera overlay utilities and careful calibration ensure consistent perception between sim and physical robot.
- Aggressive per-camera and episode augmentation—randomized crops, zoom, transforms, color/gamma jitter, noise, pattern swaps, base jitter—enrich the training regime.
- Fine-tuning employs a multi-bucket batching strategy combining organizer BC data, home teleop, real-robot DAgger, and sim success-replays, each with calibrated sampling and motion speed rescaling.
- Only the action expert, garment-type, and completion heads are updated during real-robot adaptation; sim-only auxiliaries are frozen or dropped.
Empirical evaluation demonstrates an overall online success rate of 79.63% (simulation), with per-type rates up to 93.5% (short pants), and a clear margin over the next-best team (+6.1 percentage points overall). Real-robot finals yielded scores of 865/1080 (second best), and sim+real co-training improved average task success from ~15% (real only) to ~50% (mixed) (Li et al., 24 Apr 2026, Larchenko, 25 Jun 2026).
7. Impact, Extensibility, and Future Directions
The LeHome Challenge 2026 has established new standards for deformable object manipulation in robotics. Its modular, multi-solver architecture, granular extensibility, and robust fairness protocols lay the groundwork for scalable, reproducible research in household robotics (Li et al., 24 Apr 2026).
The 2026 winning pipeline demonstrates the impact of combined VLA architectures, flow-matching, asynchronous distributed RL, and inference-time adaptation. Notably, success in sim-to-real transfer was contingent on tight camera alignment, aggressive augmentation, and DAgger-weighted human-in-the-loop corrections. A plausible implication is that further improvement will require unified pipelines capable of continual RL+DAgger learning across both domains, potentially achieving >90% success in future benchmarks (Larchenko, 25 Jun 2026).