Papers
Topics
Authors
Recent
Search
2000 character limit reached

YOR: Your Own Mobile Manipulator for Generalizable Robotics

Published 11 Feb 2026 in cs.RO and cs.LG | (2602.11150v1)

Abstract: Recent advances in robot learning have generated significant interest in capable platforms that may eventually approach human-level competence. This interest, combined with the commoditization of actuators, has propelled growth in low-cost robotic platforms. However, the optimal form factor for mobile manipulation, especially on a budget, remains an open question. We introduce YOR, an open-source, low-cost mobile manipulator that integrates an omnidirectional base, a telescopic vertical lift, and two arms with grippers to achieve whole-body mobility and manipulation. Our design emphasizes modularity, ease of assembly using off-the-shelf components, and affordability, with a bill-of-materials cost under 10,000 USD. We demonstrate YOR's capability by completing tasks that require coordinated whole-body control, bimanual manipulation, and autonomous navigation. Overall, YOR offers competitive functionality for mobile manipulation research at a fraction of the cost of existing platforms. Project website: https://www.yourownrobot.ai/

Summary

  • The paper introduces YOR as a low-cost, modular mobile manipulator that integrates an omnidirectional base, vertical lift, and bimanual arms.
  • It employs advanced teleoperation and data-driven policy learning, achieving high success rates in pick–carry–place and navigation tasks.
  • The platform's open-source design and modularity promote scalable research in indoor robotics with precise whole-body manipulation.

YOR: An Open-Source, Low-Cost, Bimanual Mobile Manipulation Platform

Motivation and Contribution

Mobile manipulation has emerged as a key focus in robotics, driven by large-scale data-driven policy learning and the need for affordable, robust platforms deployable in human environments. Current commercial mobile manipulators are either cost-prohibitive, lack the necessary dexterity, or are difficult to extend. "YOR: Your Own Mobile Manipulator for Generalizable Robotics" (2602.11150) addresses these barriers with a fully open-source, modular, low-cost robot (USD 9,250 BOM) that integrates an omnidirectional base, vertical lift, and dual compliant arms for bimanual whole-body manipulation. Figure 1

Figure 1: YOR is an open-source mobile manipulator combining an omnidirectional base, a lift, and two arms, balancing dexterity and affordability.

The core contributions are: (1) presenting a practical form factor that balances affordability, workspace, and dexterity without the mechanical or control complexity of legged or humanoid designs, (2) fully open-sourcing hardware/software including off-the-shelf CAD and BOM, and (3) empirical evaluation on whole-body teleoperation, policy learning, and autonomous navigation in unstructured indoor environments.

Design Principles and Hardware Architecture

YOR’s design targets the requirements of scalable research: cost, controllability, robustness, and extensibility for learning-based mobile manipulation.

The omnidirectional base uses a four-module swerve drive for decoupled translation/rotation, granting superior maneuverability in cluttered scenes versus traditional non-holonomic or large-footprint designs, as highlighted in comparison with other platforms. Figure 2

Figure 2: Comparative analysis of YOR against Tidybot++, XLeRobot, Mobile-ALOHA, and RB-Y1 manipulators, emphasizing workspace, dexterity, and price trade-offs.

YOR’s lift, repurposed from commercial standing-desk actuators, elevates the shoulder from 0.6 m to 1.24 m, thereby extending end-effector reach from floor to overhead, increasing vertical workspace for domestic manipulation. The dual 6-DoF PiPER arms (US$2,500 per arm) allow for bimanual, compliant tasks, with custom angular jaw grippers and integrated consumer-grade sensing.

The system architecture strictly enforces modularity (compute, power, arm, sensing upgrades are decoupled), with heavy components densely arranged at the base to maximize stability during manipulation. Figure 3

Figure 3: Cost breakdown showing modular subsystem composition; arms comprise the majority of the subtotal, maintaining overall BOM under US$10K.

Figure 4

Figure 4: Low center-of-mass achieved by centralized dense packing in the base, surrounded by swerve-drive wheel modules.

Whole-Body Teleoperation and Data Collection

YOR's whole-body control stack allows for both joint and operational space control of the arms, either directly or through teleoperation. Intuitive teleop is provided via offboard pose tracking (Meta Quest controllers), directly mapping hand movements to end-effector pose, with vertical motion decoupled for ergonomic demonstration. Figure 5

Figure 5: YOR executing household tasks under teleoperation: loading a dishwasher, watering plants, and manipulating objects at different elevations.

This system supports rapid data collection for learning-from-demonstration workflows, crucial for scaling robot skill acquisition outside tightly controlled lab settings.

Imitation Policy Learning on Bimanual Mobile Tasks

YOR’s capabilities for policy learning are validated on a challenging pick–carry–place task requiring bimanual grasping, vertical positioning, obstacle-aware navigation, and complex coordination. Policy learning is performed using a VQ-BeT transformer-based behavior cloning approach, fusing multi-view wrist and head images with robot proprioception. 100 filtered expert demonstrations are collected under teleoperation.

In policy deployment, the system achieves perfect completion on pickup and height adjustment tasks (10/10), and a 90% overall success for navigation and object drop, with the only observed failure modes attributable to odometry drift from head camera occlusion. Figure 6

Figure 6: Policy-driven trajectory showing bimanual grasp, obstacle avoidance, and recycling-bin drop-off.

Autonomous Navigation, SLAM, and Locomanipulation

Localization, mapping, and collision avoidance are handled via on-board visual-inertial SLAM (ZED 2i) and dense voxel maps, with classic A*-based planning and pure pursuit tracking. Floor segmentation and collision inflation provide robustness in cluttered, dynamic indoor environments.

Mapping combines point clouds and IMU data into a world-referenced voxel map, enabling dynamic path replanning and semantic navigation primitives. Figure 7

Figure 7: Pipeline integrating ZED point cloud and visual-inertial odometry for global mapping.

YOR achieves repeatability and stability in closed-loop base–manipulator tasks such as continually marking a fixed point during mobile loops (positional error: ≤12 mm over 10 laps), and minimal deviation (<16 mm) in “chicken head” style demos where the arm compensates for base motion. Figure 8

Figure 8: Real-time kinematic compensation locks end-effector world-frame pose during base translation/rotation.

Figure 9

Figure 9: Consistent positioning demonstrated by accumulator marks after repetitive closed-loop trajectories; all marks fall within a 12 mm radius of the starting point, close to the 50 mm SLAM limit.

Dynamic recomputation enables YOR to avoid unforeseen obstacles, with less than one second latency in response to moving humans during autonomous navigation. Figure 10

Figure 10: Online replanning in the presence of moving obstacles; internal voxel representation and physical world trajectory updated within 1 second.

Implications and Future Directions

The YOR platform removes resource and deployment barriers for mobile manipulation research, particularly in home environments. The results suggest that affordable, modular, open systems can deliver sufficiently precise, dexterous, and reliable performance to support data-driven bimanual manipulator research entirely outside proprietary or high-cost ecosystems.

A notable claim is that YOR’s modular, swerve/lift/bimanual form factor outperforms all existing platforms in the <$10K category on workspace, dexterity, and controllability, with empirical results supporting its autonomy, task repeatability, and ease of controller integration.

Theoretically, YOR enables large-scale, real-world data collection for mobile manipulator skill learning, and is well-positioned for zero-shot transfer, whole-body RL, and perceptive robot planning research—areas that have been bottlenecked by hardware inaccessibility.

Practically, the generalizability of YOR’s form factor, its plug-and-play design, and entirely open resource base will facilitate diversity in experimental paradigms and rapid reproduction of results across the global robotics community.

Conclusion

YOR (2602.11150) demonstrates that mobile manipulation platforms can simultaneously deliver dexterity, vertical reach, bimanual manipulation, omnidirectional motion, and compliance at a low cost and with open extensibility. The platform’s performance in teleoperation, policy learning, and SLAM-based autonomous navigation establishes a new baseline for affordable bimanual mobile manipulation, with the hardware, firmware, and software all released as open-source infrastructure.

Future extensions—including 7-DoF arms and semantic navigation—would further increase the system’s capability envelope. YOR is poised to support the next wave of large-scale embodied learning and generalist robotics experimentation outside costly, proprietary silos.

Paper to Video (Beta)

Whiteboard

Explain it Like I'm 14

Overview

This paper introduces YOR, a low-cost, open-source robot designed to move around and use two arms to do everyday tasks. Think of YOR like a helpful mobile assistant: it can drive in any direction, raise and lower its “shoulders” like an elevator, and use two “hands” to pick up, carry, and place things. The main goal is to give researchers and hobbyists an affordable, easy-to-build robot they can use to study and improve mobile manipulation—the skill of moving and handling objects while navigating real spaces.

What Questions Did the Researchers Ask?

The paper focuses on simple, practical questions:

  • Can we build a capable mobile robot for under $10,000 using parts you can buy off the shelf?
  • What is the best shape and design (form factor) for a robot that moves smoothly, reaches high and low places, and uses two arms safely around people?
  • Will this robot be easy to control, learn from demonstrations, and navigate new environments on its own?

How Did They Build and Test the Robot?

Building the robot

They designed YOR with three key parts, each chosen for simplicity, safety, and cost:

  • An omnidirectional base: four special wheel modules turn and drive so the robot can move sideways, forward, backward, and rotate without awkward “three-point turns.” It’s small (about 43 × 34.5 cm), so it fits in tight home spaces.
  • A vertical lift: like a mini elevator, it raises and lowers the arms across a big height range (about 63.5 cm of “up-down” motion), letting YOR reach the floor or shelves.
  • Two compliant arms with grippers: “compliant” means the arms are gentle and flexible, like springy joints, so they’re safer around people and better for tasks where the arms bump into things.

They use a stereo camera (ZED 2i) as the robot’s “eyes,” and the whole setup costs under $10,000 (most of that goes to the arms).

Teaching and controlling the robot

To teach YOR, they used VR-style controllers (Meta Quest) for teleoperation. Imagine you move your handheld controller in the air, and the robot’s hand follows that motion. Buttons control the base (drive) and the lift (up/down).

They also trained YOR using imitation learning: they recorded human-controlled demonstrations of a “recycling” task—pick up a big box with two hands, drive around an obstacle, and drop it into a bin. The robot learned patterns from camera video (on its “wrists” and “head”) plus its own position data to copy the behavior.

Mapping and navigation

YOR makes a map of rooms while it moves (a technique called SLAM—Simultaneous Localization and Mapping). In simple terms, it builds a 3D picture of the world and figures out where it is inside that picture, even as it moves. Using the map, it plans paths around furniture and people, and keeps updating its plan if something new appears.

They used classic planning algorithms (like A*) to find safe routes and a tracking method (Pure Pursuit) to follow them smoothly.

What Did They Find?

  • The robot can do whole-body tasks: move its base, adjust height, and use both arms together to handle real objects, like opening/closing a dishwasher, watering plants, or picking up baskets.
  • It learned a bimanual “recycling” task: in 10 test runs, it successfully picked up the box 10/10 times, lifted it 10/10 times, navigated around obstacles 9/10 times, and completed the whole task 9/10 times. The main issue was drift in its position tracking when the camera view was blocked.
  • Accurate repeatability: when it drove loops and marked the same spot on a paper each time, most marks landed within about 12 mm (roughly the size of a small coin radius), showing solid mapping and control.
  • Dynamic obstacle avoidance: when a person walked in front of it, YOR updated its map and re-planned a new path within about a second to avoid collisions.

Why this matters: These results show you don’t need a super expensive robot to do useful mobile manipulation research with two arms and real movement in homes or labs.

Why It Matters

YOR’s big impact is accessibility. Because it’s open-source, affordable, and built from common parts, more schools, labs, and makers can build one and try new ideas. That means:

  • Faster progress in robot learning, especially for tasks involving moving and handling things in real homes.
  • More diverse data and experiments, which helps robots become safer, smarter, and more reliable.
  • A practical base for future upgrades, like better arms (with 7 joints for even smoother motion) and smarter navigation that understands objects and rooms semantically (for example, “go to the kitchen table” without manual mapping).

In short, YOR is a strong step toward everyday helper robots by making high-quality mobile manipulation research possible on a modest budget.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, consolidated list of what remains missing, uncertain, or unexplored in the paper, articulated to guide concrete follow-up research.

  • Cost and accessibility claims need replication: provide a full, reproducible BOM with supplier variability, lead times, and sensitivity analysis to regional availability and price fluctuations.
  • Mechanical reliability over time is unquantified: measure MTBF, wear patterns (swerve azimuth gears, lift rails), and failure rates under typical usage (e.g., 500–5,000 hours) and contact-rich tasks.
  • Stability margins are not characterized under worst-case conditions: quantify static/dynamic stability with arms fully extended, lift at maximum height, carrying varying payloads, and during abrupt base accelerations/turns.
  • Payload capacity and manipulation envelope are under-specified: report maximum payload at different lift heights and arm configurations, end-effector force limits, and deformation under load.
  • Surface and terrain robustness is unclear: evaluate locomotion on carpets, rugs, thresholds, ramps, low-friction surfaces, and tight spaces; report traction limits, wheel slip, and caster scrub effects.
  • Battery, runtime, and power management are not described: specify battery chemistry, capacity, expected runtime under diverse workloads, charging strategy, thermal performance, and power-fail behavior.
  • Safety mechanisms lack formal validation: document e-stop hardware, force/velocity limiting regimes, collision detection, safe-stop behaviors, and compliance with relevant standards (e.g., ISO 13482).
  • Odometry and localization dependence on ZED is brittle: add and compare wheel odometry/IMU fusion; quantify drift vs lighting, texture, motion blur, occlusion, and feature-poor environments.
  • SLAM scalability and robustness are limited: benchmark mapping accuracy and failure modes across multi-room environments, low light, reflective/transparent surfaces, and heavy dynamic clutter.
  • Loop-closure reliability is not evaluated: measure false positives/negatives, map corruption rates, and re-localization across sessions; introduce map lifecycle (save/load) tests and persistence.
  • Sensor occlusion failure mode is only noted, not mitigated: implement and compare redundancy (wrist camera fusion, upward-facing camera, fiducials, depth-only fallback) to reduce occlusion-induced drift.
  • Semantic perception is absent: integrate semantic mapping/object detection and quantify benefits to planning and bimanual manipulation; evaluate zero-shot task execution with semantics.
  • Navigation planning is classical and 2D: assess limitations in tight 3D spaces (under tables, shelves), overhangs, and vertical elements; compare against 3D planners and kinodynamic planning for base+lift.
  • Dynamic obstacle handling is demo-only: quantify replan latency distributions, near-collision rates, path optimality under moving obstacles, and human-aware navigation (comfort and safety metrics).
  • Whole-body loco-manipulation optimization is not addressed: develop planners/controllers that jointly optimize base, lift, and arms under constraints (reachability, stability, collision) and benchmark against task success/efficiency.
  • Bimanual coordination is minimally evaluated: measure synchronization errors, role assignment (leader–follower), and performance in tasks requiring force closure, handovers, or complex object reorientation.
  • Arm compliance vs accuracy trade-offs are not quantified: sweep KpK_p, KdK_d, gravity compensation accuracy, and trajectory jerk limits to measure tracking accuracy, contact forces, and task success.
  • Singularities and redundancy handling are deferred: analyze frequency and impact of singularities with 6-DoF arms; test 7-DoF arms and quantify improvements in reachability, dexterity, and safety.
  • Gripper design and sensing lack rigorous evaluation: characterize grasp success across object classes (soft, slippery, irregular), slippage under motion, and effects of iPhone sensor latency/battery/temperature.
  • Calibration procedures are not detailed: provide repeatable extrinsic calibration for head/wrist cameras, arm–lift–base frames, and assess drift and re-calibration frequency in typical operation.
  • Teleoperation ergonomics and efficacy are untested: conduct user studies comparing the proposed Quest controller scheme vs alternatives (GELLO, OpenTeach, Mobile-ALOHA), measuring comfort, learning curve, and demo quality.
  • Data efficiency and generalization of learned policies are unclear: go beyond a single “recycling” task to quantify performance across tasks, environments, object variations, and with out-of-distribution conditions.
  • Comparative baselines are missing: benchmark YOR against TidyBot++, XLeRobot, Mobile-ALOHA, RB-Y1, etc., on common tasks and metrics (success rate, time-to-completion, safety events, cost-normalized performance).
  • Remote inference introduces unquantified latency/reliability risks: measure end-to-end delays, jitter, packet loss, and task impact under varying network conditions; evaluate onboard vs offboard inference trade-offs.
  • Software stack robustness and modularity need stress tests: assess process isolation, failure recovery, watchdogs, and real-time guarantees across the RPC/ZMQ layers under high load and sensor dropouts.
  • Task-space control under moving base (“chicken head”) needs broader metrics: evaluate end-effector stability during varied base motions (curvilinear, stop–go), different velocities, and contact interactions.
  • Mapping and planning rates may be insufficient: quantify the impact of 5–10 Hz mapping/planning on fast motions; explore higher-rate pipelines or event-driven updates to reduce lag-induced collisions.
  • Footprint vs reach trade-off is unquantified: analyze workspace coverage, manipulability maps, and accessibility in real homes (countertops, sinks, cabinets), including the effect of shoulder tilt angles.
  • Cost of arms dominates BOM: examine alternative arms and grippers at lower cost, and quantify the impact on dexterity, payload, and controllability; provide upgrade pathways with measured benefits.
  • Open-source readiness is not demonstrated: release CAD, firmware, calibration tools, and assembly guides; conduct third-party replication studies to validate build time, cost, and performance reproducibility.
  • Environmental impact and sustainability are unaddressed: report energy use per task, recyclability of components, and maintenance footprint to inform large-scale deployments.
  • Security and privacy considerations are omitted: specify data handling, on-robot logging, network hardening, and policies for cameras in home environments.
  • Ethical deployment in homes lacks guidelines: propose protocols for safe human–robot interaction, consent, and human factors in shared spaces, and test with diverse user populations.
  • Scalability to multi-robot or fleet settings is unexplored: evaluate interference, shared maps, task allocation, and infrastructure demands for multi-YOR operation.
  • Benchmark suite is missing: define standardized, multi-task, home-like benchmarks (with manipulability constraints, dynamic obstacles, semantics) for fair evaluation and progress tracking across platforms.

Glossary

  • A: A graph search algorithm that finds an optimal path by combining path cost and a heuristic estimate to the goal. "A \citep{4082128} provides optimal graph-based path planning"
  • azimuth: The steering angle of a wheel module around the vertical axis. "a NEO 550 coupled to an UltraPlanetary gearbox for azimuth (steering) control"
  • bimanual manipulation: Coordinated use of two robot arms to perform a task. "For bimanual manipulation, YOR is equipped with two PiPER arms fitted with custom grippers."
  • closed-loop control: A control method that uses feedback from sensors to continuously correct actions toward a desired state. "This transform is used for closed-loop control on pose and for map-fusion in a consistent world-frame."
  • compliant actuation: Actuation designed to yield or adapt under contact forces, improving safety and adaptability. "The arms' compliant actuation ensures safe interaction during contact-rich tasks and reduces the need for overly cautious teleoperation."
  • cost map: A grid representation that assigns traversal costs to locations for navigation and planning. "The live point cloud is down-sampled, transformed to the world frame and similarly projected to a local 2D cost map."
  • coupling matrix: A matrix that maps chassis motion to individual wheel module velocities in swerve kinematics. "The kinematics mapping from the chassis velocity twist vb=[vx,vy,ω]T\mathbf{v}_b = [v_x, v_y, \omega]^T to the individual wheel velocity vectors is governed by the coupling matrix CR8×3C \in \mathbb{R}^{8 \times 3}"
  • end-effector (EE): The tool or gripper at the end of a robot arm that interacts with the environment. "The pose of the Quest controllers are calibrated and retargeted to the end-effector (EE) pose of the arms."
  • gravity compensation: Feedforward torques applied to counteract gravity so lower gains can be used for compliant control. "The feedforward torque gravity compensation allows us to set low stiffness gains, resulting in compliant movement."
  • H-bridge: An electronic circuit that allows a DC motor to be driven in both directions. "The Pico reads real-time position data from the lift's quadrature encoder and drives the DC motor via a BTS7960 high-current H-bridge"
  • histogram-based floor detection: A method that identifies the floor by analyzing height distributions in point cloud data. "we employ histogram based floor detection to find the lowest height mode in the voxel data"
  • inverse kinematics: Computing joint configurations that achieve a desired end-effector pose. "Based on our chassis dimensions of half-width W=0.152W=0.152~m and half-length L=0.106L=0.106~m, the inverse kinematics relation is derived as:"
  • joint stiffness controller: A controller that makes joints behave like spring-damper systems around target positions. "Therefore, we implement a joint stiffness controller."
  • kinematic redundancy: Having more degrees of freedom than strictly necessary for a task, allowing multiple equivalent solutions. "optimizing whole-body motion to handle environmental constraints and resolve kinematic redundancy."
  • loop closure: A SLAM event where the system recognizes a previously visited place to correct accumulated drift. "Odometry comes from the ZED-SDK~\citep{zed_sdk}, along with loop closure signals, which are used to compute the final pose of the camera in the world frame."
  • non-holonomic constraints: Motion constraints that restrict instantaneous movement directions, typical of wheeled robots. "without the non-holonomic constraints of differential drive systems~\citep{Siegwart2011}."
  • odometry: Estimation of a robot’s change in position over time from onboard sensors. "We collect expert demonstrations with teleoperation at 30Hz and discard any trajectories where the robot loses odometry tracking"
  • omnidirectional swerve drive: A drive system where each wheel can steer and drive independently, enabling motion in any planar direction. "The mobility system of YOR is built upon an omnidirectional swerve drive architecture, enabling decoupled translational and rotational control."
  • passively stable: A property where the system maintains balance without active control due to its geometry and mass distribution. "YOR is also passively stable, which simplifies control during dynamic motion."
  • PID controller: A feedback controller using proportional, integral, and derivative terms to track a reference. "A PID controller is used to track a look-ahead point along this path at 50 Hz with feedback from the estimated base pose."
  • proprioception: Internal sensing of a robot’s state, such as joint positions or poses. "For proprioception, we record end-effector poses in the top-of-lift reference frame, lift height, and base odometry from ZED."
  • Pure Pursuit: A geometric path-tracking algorithm that follows a look-ahead point along a planned path. "The resulting waypoint sequence is sent to the base controller which uses Pure Pursuit~\citep{coulter1992implementation} algorithm to track them."
  • quadrature encoder: A sensor producing two out-of-phase signals to determine position and direction of rotation. "The Pico reads real-time position data from the lift's quadrature encoder"
  • quasi-static stability: Stability maintained under slow motions where dynamic effects (inertia) are negligible. "ensuring the system maintains quasi-static stability even when the manipulators are fully extended or handling payloads."
  • scrub radius: The offset affecting steering effort due to friction when rotating a wheel in place. "This size is a deliberate design choice that minimizes the wheel's scrub radius, significantly reducing the static steering torque requirements compared to larger casters"
  • SE(3): The mathematical group of 3D rigid body poses (3D rotations and translations). "Onboard the robot, we use visual--inertial SLAM~\citep{qin2018vins} to estimate the robot pose as a time-varying rigid transform $T_{\text{WB}(t)\in \text{SE}(3)$."
  • shortest-turn optimization: A swerve-control technique that flips wheel direction to minimize steering rotation. "we implement a ``shortest-turn'' optimization in the low-level controller."
  • SLAM (Simultaneous Localization and Mapping): Building a map of an environment while estimating the robot’s pose within it. "Onboard the robot, we use visual--inertial SLAM~\citep{qin2018vins} to estimate the robot pose"
  • stereo depth camera: A camera with two lenses that estimates depth from disparity between image pairs. "Our perception stack is based on the ZED 2i stereo depth camera."
  • stroke length: The total linear extension range of a telescopic actuator. "A telescopic vertical lift with a stroke length of $63.5$ cm extends YOR's reach from floor level to overhead"
  • support polygon: The convex area on the ground enclosed by contact points that determines static stability. "maximizing the area of the support polygon to improve stability."
  • task-space control: Controlling the robot in Cartesian coordinates of the end-effector rather than joint angles. "We provide both joint control and task-space control for the arms."
  • teleoperation: Remote operation of a robot by a human using input devices. "we design an intuitive whole-body teleoperation system that uses Meta Quest 3/3S controllers to control all components of YOR."
  • twist: A vector combining linear and angular velocities describing rigid body motion. "The kinematics mapping from the chassis velocity twist vb=[vx,vy,ω]T\mathbf{v}_b = [v_x, v_y, \omega]^T"
  • voxel inflation: Expanding occupied cells in a voxel map to account for robot size and safety margins. "We derive the global map after sufficient voxel inflation to account for robot size and additional voxel inflation to add higher cost near obstacles."
  • voxel map: A 3D grid of volumetric pixels used to represent occupancy in space. "RGB and stereo depth observations are converted to point clouds and integrated into a voxel map in the world frame"
  • waypoint sequence: An ordered list of target positions along a planned path for the controller to follow. "The resulting waypoint sequence is sent to the base controller"
  • whole-body control: Coordinated control of base, arms, and other joints to achieve complex tasks. "We validate YOR's capabilities through integration tests demonstrating whole-body control, bimanual manipulation, and autonomous navigation."

Practical Applications

Immediate Applications

Below is a concise set of practical applications that can be deployed now, derived directly from the paper’s hardware, software, and workflow contributions.

  • Robotics R&D (Academia) — low-cost, open-source bimanual mobile manipulator for whole-body research
    • Potential tools/products/workflows: YOR build kit (CAD, BOM), swerve-base control stack, compliant arm controllers (Ruckig + Mink), teleoperation with Meta Quest controllers, SLAM + navigation stack (ZED 2i, A*, Pure Pursuit), remote inference via commlink/ZMQ, imitation learning pipeline (e.g., VQ-BeT)
    • Assumptions/Dependencies: Availability of off-the-shelf parts (PiPER arms, REV MAXSwerve modules, telescopic lift); operator training; safe, structured indoor spaces; sufficient on-board and remote compute; stable Wi‑Fi/Ethernet; stereo-depth and VINS performance in the given environment
  • Education (Universities, Makerspaces, Vocational Programs) — hands-on mobile manipulation curriculum
    • Potential tools/products/workflows: Course modules on swerve kinematics, whole-body control, compliant manipulation, SLAM; student projects replicating dishwasher loading, watering plants, object pickup; capstone competitions; modular upgrades/swaps of subsystems
    • Assumptions/Dependencies: Instructor expertise; budget for <$10k BOM per unit; adherence to safety protocols; access to basic fabrication tools; consistent indoor testing areas
  • Service Robotics Prototyping (Startups/SMBs) — pilot deployments for household/office tasks
    • Use cases: Teleoperated or semi-autonomous object pickup, recycling/waste sorting (as demonstrated), light tidying, small-item deliveries within an office, loading/closing appliances with bimanual control and vertical reach
    • Potential tools/products/workflows: Teleop workflows with VR controllers; imitation learning pipelines to bootstrap policies from demonstrations; dynamic obstacle avoidance; remote monitoring dashboards
    • Assumptions/Dependencies: Tasks suited to light payloads and compliant grippers; limited speed (software cap ~0.25 m/s); environment amenable to small-footprint robots; safety and liability coverage
  • Assistive Telepresence (Home Care, Accessibility) — remote manipulation for daily living support
    • Use cases: Picking items from floor and shelves, loading dishwashers, watering plants, basic organization for people with limited mobility, remote caregiver assistance
    • Potential tools/products/workflows: Intuitive bimanual teleoperation; lift control via controller buttons; wrist-mounted smartphone cameras for better end-effector observation; caregiver scheduling and session logging
    • Assumptions/Dependencies: Reliable network; caregiver/operator training; clear line-of-sight in cluttered homes; compliance and safety practices; non-medical use (no clinical certification in current form)
  • Indoor Mapping & Navigation (Facilities, Labs) — agile 2D cost maps and dynamic obstacle avoidance
    • Use cases: Quick mapping of small facilities/labs; route validation with dynamic obstacle avoidance; baseline SLAM for indoor robotics workflows
    • Potential tools/products/workflows: ZED 2i stereo depth with VIO, loop closure integration; voxel-to-2D projection; weighted A* with Pure Pursuit tracking; state estimation quality gating
    • Assumptions/Dependencies: Textured/feature-rich environments; acceptable odometry drift bounds (paper reports ~50 mm map accuracy, ~16 mm end-effector compensation error in demo); adequate lighting; periodic recalibration
  • Data Collection for Vision-Language-Action (VLA) and Imitation Learning (Research, Industry) — standardized bimanual, whole-body dataset creation
    • Use cases: Curate multi-modal datasets (wrist/head RGB, end-effector poses, lift/base state) for training policies like VQ-BeT, ACT-style architectures
    • Potential tools/products/workflows: Time-synchronized recording (30 Hz), sensor fusion via ZMQ; dataset schemas that generalize across tasks; filters for odometry quality
    • Assumptions/Dependencies: Storage and labeling infrastructure; consistent sensor mounts (smartphone wrists, ZED head); robust odometry (occlusion can cause drift and failed trajectories)
  • Benchmarks & Competitions (Community, Academia) — replicable tasks for whole-body loco-manipulation
    • Use cases: “Recycling task” benchmark (pick, carry, navigate, place); tally-mark repeatability tests; whole-body coordination trials
    • Potential tools/products/workflows: Public task protocols; success-rate metrics (paper shows 9/10 overall in proof-of-concept); shared leaderboards; hardware reference builds
    • Assumptions/Dependencies: Agreement on standardized environments and metrics; maintenance and spare parts; operator safety procedures
  • Office Logistics (SMBs) — light-duty item transport and recycling workflows
    • Use cases: Collecting recyclables and carrying small boxes, transporting office supplies, inter-desk deliveries
    • Potential tools/products/workflows: Predefined routes with cost maps; dynamic replanning around people; bimanual grasp for bulky but light items
    • Assumptions/Dependencies: Payload limits of PiPER arms; human-aware speed limits; narrow doorways and elevators; battery runtime and charging schedules
  • Open-Source Tooling & Kits (Robotics Ecosystem) — productization of the YOR stack
    • Use cases: Commercialization of build kits; bundled teleop and navigation software; community forks and add-ons (e.g., different grippers, alternative cameras)
    • Potential tools/products/workflows: Maintained repositories for CAD/BOM/code; packaged installers; module marketplaces for shoulder plates, grippers, sensor pods
    • Assumptions/Dependencies: Licensing clarity; supplier sourcing; documentation and community support; QA and versioning
  • Public Engagement Pilots (Libraries, Schools, Municipal Programs) — robotics literacy and policy-informing demos
    • Use cases: Demonstration days; hands-on robotics for STEM; feedback collection to inform indoor robot safety guidelines
    • Potential tools/products/workflows: Structured demos of teleop and autonomous navigation; incident logging; operator certification basics
    • Assumptions/Dependencies: Institutional risk management; staff availability; clear safety perimeters and signage

Long-Term Applications

These applications require further research, scaling, engineering hardening, or regulatory pathways before broad deployment.

  • Autonomous Home Assistant (Consumer Robotics) — generalist mobile manipulation in unstructured homes
    • Potential tools/products/workflows: Integration with state-of-the-art semantic navigation, VLA models, whole-body controllers; autonomous task scheduling; docking/charging; failure recovery
    • Assumptions/Dependencies: Robust perception under occlusion; semantic memory/dynamic mapping (e.g., >>50 mm accuracy under varied conditions); 7-DoF arms to avoid singularities; diverse datasets across households; long-term reliability; safety certification
  • Assistive Healthcare & Eldercare (Healthcare) — ADL support and monitored autonomy
    • Use cases: Object fetching, tidying, simple meal preparation assistance, routine reminders via physical actions
    • Potential tools/products/workflows: Voice/gesture interfaces; clinical-grade safety sensors; standardized assistive task libraries
    • Assumptions/Dependencies: Medical/assistive device certification; caregiver oversight; hygiene and infection control; robust fail-safes; liability and insurance frameworks
  • Hospitality & Retail Services (Commercial Service Robotics) — clearing tables, restocking shelves, room service
    • Potential tools/products/workflows: Specialized end-effectors (e.g., suction, multi-finger hands), higher-payload arms, task programming interfaces; fleet management
    • Assumptions/Dependencies: Hardware upgrades for payload/reach; reliability in high-traffic spaces; integration with POS/ERP systems; staff training; compliance with local regulations
  • Micro-Fulfillment & In-Store Logistics (Retail/Logistics) — picking and packing in constrained environments
    • Potential tools/products/workflows: Advanced grasping pipelines; inventory-aware navigation; collaborative workflows with staff and AMRs
    • Assumptions/Dependencies: Better grippers and perception; higher velocity limits with safe operation; semantic mapping; ROI analysis versus fixed automation
  • Teleoperation Marketplaces (Platform Economy) — remote operators perform tasks for distributed customers
    • Potential tools/products/workflows: Operator training/certification portals; scheduling, billing, and SLAs; shared dashboards with latency and safety monitoring
    • Assumptions/Dependencies: Reliable low-latency networks; standardized teleop UI/ergonomics; insurance coverage; labor regulation compliance; data privacy/security
  • Open Standards & Safety Policy (Policy, Standards Bodies) — reference platform for indoor robot governance
    • Potential tools/products/workflows: Safety checklists (compliance, speed caps, force limits), telemetry standards, data governance for teleop recordings, ethical guidelines for in-home robots
    • Assumptions/Dependencies: Multi-stakeholder consensus; pilot programs to validate frameworks; certification pathways; harmonized regional regulations
  • Large-Scale Academic Consortia (Research Infrastructure) — fleet-based data collection and benchmarking
    • Potential tools/products/workflows: Shared task libraries and evaluation suites; cross-institution datasets; reproducibility protocols; open leaderboards
    • Assumptions/Dependencies: Funding and coordination; unified APIs and schemas; data governance; hardware interoperability; maintenance pipelines
  • Smart-Home Integration (Consumer IoT) — task orchestration via voice assistants and home automation
    • Potential tools/products/workflows: APIs for semantic tasks (“fetch mug,” “load dishwasher”), room-specific maps, device interoperability (lighting, appliances)
    • Assumptions/Dependencies: IoT standards; cybersecurity; user experience acceptance; robust failover if perception fails; docking/charging management
  • Energy-Efficient Mobile Manipulation (Energy/Engineering) — optimizing compliance, speed caps, and duty cycles
    • Potential tools/products/workflows: Low-power controllers; motion planning to minimize energy; smart charging; fleet energy dashboards
    • Assumptions/Dependencies: Battery advancements; scheduling integration; accurate energy models across varied tasks; facility power constraints
  • Commercial Product Line Evolution (Robotics Manufacturing) — “YOR Pro” and specialized variants
    • Potential tools/products/workflows: Upgraded 7-DoF arms, improved grippers, redundant safety sensors, hardened components, service contracts, spares ecosystem
    • Assumptions/Dependencies: Market demand; manufacturing and QA; supply chain resilience; long-term support; pricing models that sustain quality and service

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 24 tweets with 572 likes about this paper.