Mobile Manipulators (MoMas) Overview

Updated 6 August 2025

Mobile manipulators (MoMas) are integrated robotic systems combining a mobile base and articulated arms to execute tasks over extended workspaces.
Hybrid planning strategies, both hierarchical and holistic, enable efficient motion control and real-time responsiveness while adhering to kinematic constraints.
Learning-based methods, including imitation, inverse dynamics, and causal policy optimization, enhance robustness and adaptability in dynamic, contact-rich tasks.

Mobile manipulators (“MoMas”) are integrated robotic systems that combine a mobile platform with one or more articulated manipulator arms, leveraging the synergistic benefits of mobility and dexterity. These systems are designed to perform complex tasks across extended workspaces—ranging from industrial assembly, service robotics, and construction to high-speed dynamic operations—where neither a fixed arm nor a mobile base alone is sufficient. The research landscape encompasses a spectrum of concerns spanning motion and task planning, control architectures, learning-based policy optimization, system co-design, and benchmarking for real-world deployment.

1. Fundamental Architectures and Kinematic Models

Mobile manipulators typically integrate a nonholonomic or holonomic ground vehicle (e.g., differential drive, omnidirectional, or legged base) with a serial or parallel robotic arm, yielding high-dimensional hybrid systems. Kinematic modeling approaches may treat the aggregate system as a unified chain—by embedding both base SE(2) or SE(3) configuration and manipulator joint states into a composite state vector—or decouple the motion planning between base and arm. Recent work has advanced unified representations: for example, by expressing both the base and manipulator within the Lie group SE(3) and mapping via exponentials and logarithms to vector spaces for optimization (Smith et al., 20 Oct 2024). This enables direct computation of forward and inverse kinematics constraints for the full system, supporting smooth whole-body planning that respects constraints on joint limits, velocities, and acceleration, as well as platform-specific nonholonomic constraints (Karunakaran et al., 2023).

Motion planning frameworks extend these representations to accommodate dynamic manipulator models compounded with mobile base kinematics, allowing for integrated operational-space formulations and force-interactive tasks—addressing the need for dynamic coupling-aware models as in extended-UDE control (Gao et al., 30 Mar 2024). Model abstraction is also crucial when considering cooperative manipulation (e.g., dual/base or multirobot systems) which introduces closed-chain constraints and redundancy that must be efficiently managed via hierarchical decomposition (Zhang et al., 2022).

2. Hierarchical and Holistic Planning and Control Strategies

Task and motion planning for MoMas is dominated by either hierarchical decoupling or holistic architectures.

Hierarchical Decoupling: Classic pipelines decompose planning into:

High-level task allocation or workspace goals,
Base placement determination (potentially via inverse reachability or object/scene-aware optimization (Shao et al., 29 Mar 2024)),
Manipulator motion for local manipulation.

For example, bi-level motion optimization for high-speed tasks employs high-level sequential quadratic programming (SQP) for whole-body target selection (respecting non-convex constraints such as goal pose, end-effector alignment, and collision avoidance), coupled with low-level quadratic programming (QP) for time-parameterized, double-integrator trajectory generation (Dong et al., 2020).

Holistic (Unified) Control: Recent advances emphasize treating the base and manipulator as a single, high-DoF system. In one approach, the resolved-rate control law is posed as a quadratic program which simultaneously solves for joint and base velocity commands that minimize end-effector error, manipulability soft costs, and slack in equality constraints (end-effector spatial velocity), while respecting joint limits and platform maneuverability (Haviland et al., 2021). These approaches yield smoother, faster, and more responsive whole-body motions—especially for dynamic, sensor-based tasks such as visual grasping, closed-loop pick-and-place, and on-the-go human-robot interaction (He et al., 2022).

The difference is summarized below:

Strategy	Modularity	Responsiveness	Smoothness/Continuity
Hierarchical	Decoupled/pluggable modules	Requires replanning	Possible discontinuities
Unified/Holistic	Integrated, task-driven	Reactive, real-time	Superior motion continuity

3. Learning, Imitation, and Causal Policy Optimization

Learning-based methods have emerged as a strong paradigm for end-to-end mobile manipulation, both for policy optimization and for robust perception and localization. The learning landscape includes:

Inverse Dynamics Learning: Neural networks trained as add-on modules compensate for modeling errors by mapping joint/reference histories to corrective offsets for high-speed trajectory tracking, enabling aggressive, accurate manipulation in the presence of unmodeled disturbances (Dong et al., 2020).
Causal Policy Gradients: Mobile manipulator tasks are naturally multi-objective (e.g., navigation, obstacle avoidance, precise grasping). Causal MoMa introduces a data-driven mechanism for discovering and exploiting the causal structure between action channels (base, arm, head DoFs) and composite rewards, enabling variance-reduced, unbiased policy gradient estimation. The approach learns, for each action, which part of the reward it can influence, automatically partitioning the action space and improving convergence and sim-to-real generalization (Hu et al., 2023).
Visual-Force Imitation Learning: Hybrid learning strategies use large, vision-based expert datasets and deep retrieval networks to match current observations to expert demonstrations, inheriting both pose corrections and wrench (force/torque) targets. This approach, coupled with admittance whole-body control, yields robust, low-force, and high-success contact-rich household manipulation (Yang et al., 2023).
Teleoperation-Driven Data Collection: Modular teleoperation platforms enable acquisition of whole-body demonstration datasets via VR, RGB-D pose estimation, or standard joysticks, greatly enhancing the capacity for behavioral cloning and imitation policy training (Dass et al., 12 Mar 2024, Honerkamp et al., 23 Sep 2024).

4. Task-Driven Design, Base Placement, and Benchmarking

Physical embodiment and task-driven configuration of mobile manipulators are critical for practical deployment:

Co-Design/Cross-Optimization: The simultaneous optimization of hardware parameters (such as arm mounting, offsets, and pitch/yaw) and multi-task control policy (e.g., via RL) yields kinematic arrangements better suited to real-world household tasks than traditional tabletop mounting. Bayesian Optimization with HyperBand (BOHB) is applied in outer loops over expensive RL-trained inner loop policy evaluations, with results indicating substantial improvements in generalization and motion range (Schneider et al., 21 Dec 2024).
Base Position Optimization: Frameworks that intelligently select task-relevant objects (through graph-based neural inference of object importance) and apply inverse reachability and object-kinematic-aware constraints, coupled with potential field and open TSP optimization, have been shown to find efficient, collision-free, task-feasible base placements with high reliability and efficiency (Shao et al., 29 Mar 2024).
Benchmarking and Dataset Generation: MoMa-Kitchen provides a dataset with 127,000+ samples for the "last-mile" problem—determining affordance-grounded positions on the floor from which manipulation is possible in dense, cluttered kitchens. The methodology uses full-environment simulation, a semantically guided sampling pipeline, and Gaussian-interpolated affordance labeling per robot model, producing a robust basis for both navigation and manipulation policy learning (Zhang et al., 14 Mar 2025). Novel baseline models (e.g., NavAff) leverage visual alignment, semantic/geometry fusion, and robot-specific conditioning to enable strong transfer across hardware and scene variations.

5. State-of-the-Art Applications and Empirical Performance

Recent experimental and benchmarking results demonstrate the capabilities and limitations of state-of-the-art mobile manipulators:

High-Speed Dynamic Manipulation: Integrated bi-level optimization and neural inverse dynamics modules achieve 85.33% success rates in aggressive ball catching with sub-15 mm tracking RMSE, exceeding prior works (Dong et al., 2020).
Cooperative Transport: Semi-coupled, hierarchical planners enable multiple MoMas to reliably transport rigid objects across obstacle-rich environments while maintaining kinematic feasibility and redundancy constraint metrics (Zhang et al., 2022).
Contact-Rich Manipulation: Visual-force imitation and admittance control reduces task forces and variance, yielding 73.3% mean success across household tasks, outperforming classical behavior cloning (Yang et al., 2023).
Inspection and Navigation: Compact, stable hardware modifications, precise low-footprint sensing, and stable multi-level control streamline inspection tasks in real-world, cluttered, low-clearance areas (Pearson et al., 2023).
Active Perceptive Manipulation: Planning with information gain and grasp reachability (receding horizon) improves mobile grasping in cluttered scenarios, with experimentally validated success rates above 92–95% in simulation and 80% in real-world transfer (Jauhri et al., 2023).

6. Human-in-the-Loop Autonomy and Interaction

Semi-autonomous and variable autonomy regimes are highly relevant in uncertain or dynamic environments:

Supervised Autonomy/Teleoperation: Semi-autonomous architectures combine automated planning (via perception, mapping, AI-driven planners) and human oversight for safety, leveraging GUIs and real-time visualization for operator approval and intervention. Preliminary user studies show rapid learning curves, robust performance, and adaptability to new users (Al-Hussaini et al., 2020).
Variable Autonomy and Cognitive Load: Reviews highlight the challenge of integrating whole-body autonomy, VR/AR interfaces, and intent recognition in high-risk, uncertainty-rich environments. Reducing cognitive burden and mitigating network latency (e.g., with autonomy switching if delay threshold is exceeded) are emergent areas of emphasis, with potential solutions involving LLMs for intent inference and control mediation (Contreras et al., 20 Aug 2024).

7. Implications and Outlook

Research to date demonstrates that mobile manipulators, when equipped with advanced planning, unified control, and learning capabilities—augmented by informed physical design—can exceed traditional limitations in speed, precision, and adaptability. The convergence of model-based and data-driven approaches, benchmarked with large-scale, affordance-grounded datasets, provides a rigorous basis for further advances in embodied AI and robotics. Practical deployment now hinges on generalization across hardware, robust safety and compliance without expensive sensing, and scalable systems for demonstration collection, learning, and system co-design. Open-source releases, modular platforms, and standardized benchmarks catalyze rapid progress across the field.