Low-Level Learning-Based Control

Updated 18 October 2025

Low-Level Learning-Based Control is a data-driven approach that directly maps sensor inputs to actuator commands using techniques such as reinforcement learning.
It employs diverse architectures including end-to-end deep RL, model-based learning, and hybrid hierarchical frameworks to improve robustness, safety, and data efficiency.
Applications span aerial robotics, soft manipulation, and power grid management, addressing challenges like sensor noise, sim-to-real gaps, and computational limits.

Low-level learning-based control refers to the direct generation of actuator commands—such as motor voltages, joint torques, or PWM signals—from sensor data using learning-based methods, typically machine learning or reinforcement learning. This paradigm stands in contrast to traditional, model-based low-level controllers, instead leveraging data-driven approaches to capture system dynamics, adapt to uncertainties, and, in some cases, bypass manual system identification or parameter tuning. Recent work demonstrates that learning-based controllers can achieve robust, adaptive, and in certain cases, formally certifiable low-level control in domains ranging from quadrotor flight and in-hand manipulation to soft robotics and distributed power grid management.

1. Architectures and Learning Paradigms

Low-level learning-based controllers span a range of system architectures, from end-to-end deep reinforcement learning (DRL) policies to hierarchical frameworks that combine learning and model-based modules.

Pure end-to-end approaches leverage neural networks trained via RL, mapping high-dimensional observation vectors (sensor data, proprioceptive feedback) to actuator commands. For example, feed-forward architectures with two hidden layers of 64–256 neurons and tanh or Swish activations are commonly employed, their outputs directly controlling quadrotor motors, tilting rotors, or manipulator joints (Molchanov et al., 2019, Dooraki et al., 2023, Barros et al., 2020, Lee et al., 2021, Yu et al., 27 Feb 2025). Advanced forms utilize ensemble networks for uncertainty quantification or dual value heads to separate extrinsic and intrinsic (curiosity) reward learning (Dooraki et al., 2023).

Model-based reinforcement learning (MBRL) architectures—such as those in (Lambert et al., 2019)—train neural forward models to predict state evolution, which are then deployed within a model predictive control (MPC) framework for real-time action selection. Hybrid hierarchical frameworks combine these: a high-level RL module selects targets, regions, or grasp sequences, while a low-level model-based controller ensures dynamically feasible, safe execution—seen in applications including in-hand manipulation (Zarrin et al., 2022), multi-agent systems (Studt et al., 19 Sep 2025), and robust manipulation (Shahna et al., 4 Feb 2024).

Recent advances include incorporating symmetries via equivariant networks (Yu et al., 27 Feb 2025), leveraging curiosity-driven exploration (Dooraki et al., 2023), and modularization through multi-agent RL (Yu et al., 2023) or distributed GNN-based architectures (Fabrizio et al., 2 Sep 2025).

2. Learning Dynamics: Modeling, Data, and Training

A central challenge in low-level learning-based control is the acquisition and representation of system dynamics. Methods fall into several categories:

Model-based learning: Neural networks, typically trained on time-series sensor-actuator data, model $f_\theta(s, a)$ , predicting state transitions (sometimes probabilistically, outputting means and covariances, e.g., (Lambert et al., 2019)). Data collection entails logging during both random exploration and on-policy control, often requiring only minutes of real-world data. Data efficiency is enhanced by input histories, normalization, and robust loss functions—such as those penalizing both mean prediction error and predictive uncertainty (log-determinant loss).
Direct policy learning: RL algorithms (PPO, SAC, DDPG, TD3) are used to optimize policies mapping observations to actuator commands (Molchanov et al., 2019, Barros et al., 2020, Yu et al., 27 Feb 2025). Training in realistic simulators with domain randomization (randomizing mass, dimensions, motor lag) improves generalization and reduces sim-to-real transfer gap.
Bayesian and probabilistic models: Gaussian Process regression is used for online estimation of disturbances and model mismatch in feedback linearization controllers, providing not only adaptive compensation but also quantification of uncertainty for probabilistic stability guarantees (Yang et al., 2022).
Safety certification and constraint satisfaction: State-action control barrier functions (CBFs) are constructed via a learning-based approach rooted in Hamilton–Jacobi reachability, with quadratic parameterizations enabling convex optimization for real-time filtering of unsafe actions and explicit treatment of learning errors through constraint tightening (He et al., 2023).

3. Hierarchical and Modular Control Structures

Many recent systems decouple long-horizon or tactical decision-making from short-horizon, high-frequency dynamic control:

Hierarchical RL-MPC: High-level RL modules select discrete regions of interest (ROIs), abstract targets, or grasp sequences (e.g., for in-hand manipulation or multi-agent pursuit-evasion), while low-level MPC drives the system toward high-level goals subject to actuation, collision, and safety constraints (Studt et al., 19 Sep 2025, Zarrin et al., 2022, Kungurtsev et al., 6 Jul 2025).
Decoupling and modularization: Quadrotor dynamics often admit a decomposition into translational (roll-pitch) and yaw subsystems; modular multi-agent RL and equivariant frameworks exploit these structures, assigning separate agents or equivariant network submodules to each part, improving data efficiency and convergence (Yu et al., 2023, Yu et al., 27 Feb 2025).
Sim-to-real and adaptation layers: Controllers are typically first trained in simulation with exposure to randomized dynamics, noise, and lag, before being transferred and fine-tuned on hardware using symmetry-based data augmentation or domain adaptation (Lee et al., 2021, Molchanov et al., 2019).

4. Performance, Robustness, and Safety Metrics

Benchmarking of low-level learning-based controllers includes:

Task performance: Metrics such as hover duration and stability (quadrotors), RMSE in path following (aerial robots under wind disturbances), and trajectory tracking error (robotic arms, soft robots) are standard (Lambert et al., 2019, Yang et al., 2022, Shahna et al., 4 Feb 2024, Liang et al., 2023). In automotive contexts, lap time, average speed, and lateral acceleration are evaluated (Li et al., 5 Dec 2024).
Robustness to disturbances and model changes: Qualitative and quantitative assessments involve recovery from disturbances (throws, collisions), adaptation to hardware failures, and real-time compensation for variable dynamics or external disturbances (e.g., wind, power grid fluctuations) (Lambert et al., 2019, Barros et al., 2020, Yang et al., 2022, Fabrizio et al., 2 Sep 2025).
Sample efficiency and convergence: Performance curves report learning speed, number of required trajectories or samples, and convergence variance. Equivariant and modular architectures routinely yield significantly steeper learning curves than non-equivariant monolithic RL (Yu et al., 27 Feb 2025).
Safety and stability: Barrier function satisfaction rates, CPU time for safety filtering, and avoidance of constraint violations are directly measured in systems with formal safety guarantees (He et al., 2023). MPC-based systems inherently incorporate dynamic feasibility and collision avoidance.

The table below summarizes salient metrics observed in representative studies:

System/Domain	Core Metric	Notable Result
Quadrotor MBRL (Lambert et al., 2019)	Max. hover time	Up to 6 s after 3 min training
Tilting multirotor (Lee et al., 2021)	Stable perching time	Post-finetuning, robust across tilts
Path-following UAV (Yang et al., 2022)	RMSE, wind resilience	RMSE ≈ 0.0162 m (LB-FBLC ≥ baseline)
Power grid (Fabrizio et al., 2 Sep 2025)	Survival time, compute	Real-time, higher than expert agents

5. Challenges and Addressed Solutions

Low-level learning-based control faces unique challenges:

Sensor and actuator noise: Addressed by data preprocessing, augmenting network inputs with histories, and training under morerealistic (noisy, lagged) motor models (Lambert et al., 2019, Molchanov et al., 2019).
Sim-to-real discrepancy: Mitigated with domain randomization, curriculum learning, data augmentation with system symmetries, and limited online finetuning (Lee et al., 2021, Molchanov et al., 2019).
Computational constraints: High-frequency operation remedied through parallelization (e.g., random-shooter MPC on GPU (Lambert et al., 2019)), low-dimensional model reductions for soft robots (Liang et al., 2023), and convexified safety filters (He et al., 2023).
Sample efficiency: Leveraging symmetry (equivariant RL (Yu et al., 27 Feb 2025)), modularization (Yu et al., 2023), or curiosity-driven intrinsic rewards (Dooraki et al., 2023).
Guaranteeing stability and safety: Integration of model-based control, formal safety certificates (state-action CBFs), and safely regularized RL policy updates (He et al., 2023, Yu et al., 27 Feb 2025, Studt et al., 19 Sep 2025).

6. Applications and Broader Impact

Low-level learning-based control has demonstrated efficacy and potential for broad applicability:

Aerial robotics: Robust low-level motor control for nano- and centimeter-scale quadrotors, including aggressive flight, wall perching, and recovery from disturbances (Lambert et al., 2019, Molchanov et al., 2019, Lee et al., 2021, Barros et al., 2020, Yu et al., 27 Feb 2025).
Soft robotics: Bilevel optimization with reduced-order MPC enables real-time closed-loop control in previously intractable, highly underactuated soft robotic systems (Liang et al., 2023).
Robotic manipulation: Hybrid frameworks for in-hand manipulation combine learning-based high-level grasp selection with model-based low-level contact and force control, demonstrating robust online adaptation to task variations (Zarrin et al., 2022).
Multi-agent systems: Hierarchical RL-MPC frameworks facilitate safe, coordinated multi-agent behavior in constraint-rich environments with strong generalization (Studt et al., 19 Sep 2025).
Industrial robotics and power grids: In-context learning with LLMs orchestrates feedback-looped industrial robot actions (Zhu et al., 7 Feb 2024), and graph-based distributed RL realizes scalable, real-time grid management (Fabrizio et al., 2 Sep 2025).
Autonomous vehicles: Low-dimensional residual models embedded in learning-based MPC yield improved real-world driving accuracy and efficiency (Li et al., 5 Dec 2024).

A plausible implication is that as these frameworks mature, learning-based low-level control will become increasingly central to resilient, adaptive robotics, especially in domains where physical modeling is arduous or in rapidly changing operational contexts.

7. Future Directions

Current research points toward several promising directions:

Greater data efficiency: Expansion of symmetry-aware (equivariant) architectures and structured exploration (curiosity, intrinsic reward) to reduce sample complexity, especially for expensive hardware-in-the-loop learning (Yu et al., 27 Feb 2025, Dooraki et al., 2023).
Enhanced safety integration: Direct incorporation of formal safety constraints within RL/MPC cost and reward functions, tighter coupling of barrier function learning and control (He et al., 2023, Studt et al., 19 Sep 2025).
Autonomous model updating: Increasingly adaptive frameworks where residual models and physical priors are continually updated from online data, enabling persistent adaptation to wear, drift, or environmental changes (Yang et al., 2022, Li et al., 5 Dec 2024).
Broader domain transfer: Advanced domain randomization and transfer learning strategies, further closing the sim-to-real gap for complex dynamics and cross-platform applications (Lee et al., 2021, Molchanov et al., 2019).
Generalization across tasks and morphologies: Leveraging modular architectures and in-context learning with LLMs for zero-shot task generalization and open-world robotic operation (Zhu et al., 7 Feb 2024).

Ongoing research will continue to elevate the reliability, safety, and sample efficiency of low-level learning-based control, expanding its deployment in practical autonomous systems across both terrestrial and aerial platforms.