Quasi-Symmetric Reaction Wheel Unicycle

Updated 23 January 2026

The quasi-symmetric reaction wheel unicycle is a robotic system featuring a single driving wheel and an orthogonally mounted reaction wheel that enables decoupled pitch and roll control.
Its nonlinear dynamics are modeled using rigid-body formulations and advanced state estimation techniques like the Extended Kalman Filter, facilitating robust control under nonholonomic constraints.
Experimental performance on platforms such as the Mini Wheelbot demonstrates fast disturbance rejection, reliable self-erection, and agile tracking control.

A quasi-symmetric balancing reaction wheel unicycle is a class of robotic system characterized by a single ground-contact driving wheel combined with an orthogonally mounted, actively controlled reaction wheel, enabling upright balancing and agile maneuvering in both the sagittal (pitch) and lateral (roll) planes. The term "quasi-symmetric" denotes that the mass and inertia properties about the robot’s sagittal plane are nearly—though not perfectly—mirrored, resulting in approximately symmetric dynamics under left and right roll motions despite practical hardware imbalances. The most prominent exemplars of this class include the Mini Wheelbot and related open-source platforms, which are widely used as testbeds for learning-based control research due to their challenging, underactuated, and nonholonomic dynamics (Hose et al., 16 Jan 2026, Hose et al., 7 Feb 2025, Geist et al., 2022).

1. Mechanical Structure and Quasi-Symmetry

The typical mechanical configuration consists of:

Primary driving wheel: Situated below the robot’s body and actuated about the body-frame y-axis, the driving wheel governs pitch stabilization and forward/reverse locomotion.
Reaction wheel: Mounted above the center of mass and actuated about the orthogonal x-axis, the reaction wheel supplies the necessary roll control torque for side-to-side balance.
Chassis: A rigid, compact body, frequently composed of aluminum (Mini Wheelbot) or high-strength 3D-printed polymers (Wheelbot), houses computation, power, and sensing subsystems.

"Quasi-symmetry" is realized through hardware and mass placement such that, about the sagittal plane, left/right inertia is nearly identical even if component asymmetries (e.g., power supply, ports) remain. This yields near-equivalence of roll inertias ( $J_{bx}$ ) for positive and negative roll angles, supporting decoupled control design under small-angle approximations (Hose et al., 16 Jan 2026, Hose et al., 7 Feb 2025, Geist et al., 2022).

Representative Technical Parameters (Mini Wheelbot, (Hose et al., 16 Jan 2026)):

Parameter	Value	Description
$M_b$ (Body mass)	1.25 kg	Aluminum body
$M_{rw}$ (Reaction wheel mass)	0.30 kg	Roll actuator
$h_b$ (COM height)	0.15 m	Above wheel axle
$J_{bx}$ (Roll inertia)	0.018 kg·m²	About COM, roll axis
$J_{rw}$ (RW inertia)	0.005 kg·m²	About spin axis
$J_{dw}$ (Driving wheel inertia)	0.010 kg·m²	About axle
$r_{dw}$ (Drive wheel radius)	0.065 m
$r_{rw}$ (RW radius)	0.045 m

The typical arrangement places the driving-wheel axle directly under the body’s center of mass origin, while the reaction-wheel axle is above. The electronics and power elements are distributed to minimize yaw–roll and roll–pitch cross-coupling at the inertia level.

2. Nonlinear Dynamics and System Modeling

Governing equations derive from rigid-body Lagrangian or Newton–Euler formulations with five generalized coordinates: yaw ( $\psi$ ), roll ( $M_b$ 0), pitch ( $M_b$ 1), reaction wheel spin ( $M_b$ 2), and driving wheel spin ( $M_b$ 3), collectively $M_b$ 4. The control vector is $M_b$ 5 for reaction wheel and drive wheel torques.

Kinetic energy: $M_b$ 6

Potential energy (due to gravity, acting at the COM): $M_b$ 7

The explicit equations of motion in input-affine form: $M_b$ 8 where $M_b$ 9 is the coordinates-dependent inertia matrix, $M_{rw}$ 0 contains Coriolis/centripetal effects, $M_{rw}$ 1 embodies gravitational contributions, and $M_{rw}$ 2 maps motor torques to the equations. In the small-angle regime ( $M_{rw}$ 3, $M_{rw}$ 4) and neglecting yaw, the roll and pitch dynamics decouple into independent second-order systems: $M_{rw}$ 5

$M_{rw}$ 6

This separation is foundational for state-feedback control design and canonical in learning-based identification and policy learning (Hose et al., 16 Jan 2026, Hose et al., 7 Feb 2025, Geist et al., 2022).

Non-holonomic constraints due to rolling contact without slipping are handled implicitly in the coordinate choice: the rolling point is kinematically “pinned,” and body orientation and wheel spin are sufficient to capture surface dynamics (Hose et al., 7 Feb 2025).

3. State Estimation and Sensing

High-precision state estimation for all degrees of freedom—especially roll and pitch—is achieved using multiple IMUs and wheel encoders, often at 1 kHz. Two principal methods are implemented:

Extended Kalman Filter (EKF): Utilizes the nonlinear equations of motion and combines gyroscope, accelerometer, and encoder measurements to estimate the full state vector $M_{rw}$ 7. The prediction step follows $M_{rw}$ 8, with measurement update via nonlinear observation models. Sensor-noise covariances are tuned empirically; daily calibration yields $M_{rw}$ 9 orientation error after alignment (Hose et al., 16 Jan 2026, Geist et al., 2022).
Multi-IMU Least-Squares Tilt Estimation: An alternative that bypasses explicit state filtering, leveraging multiple IMUs and kinematic offsets to solve for tilt via a least-squares estimate on stacked corrected accelerations $h_b$ 0. This yields roll and pitch directly from available measurements and encoder-inferred pivot acceleration (Geist et al., 2022).

The sensor suite typically includes four triaxial IMUs (each yielding gyroscope and accelerometer data), optical encoders for both wheels, and in some experimental platforms, Vicon ground-truth at 100 Hz (Hose et al., 16 Jan 2026).

4. Control Strategies: Classical and Learning-Based Approaches

Decades of balancing controller development are unified in this platform via classical and modern learning-based strategies:

Linear Quadratic Regulator (LQR): Designed on the decoupled small-angle linearization, with state weights emphasizing rapid correction of roll/pitch excursions. Control law $h_b$ 1 is computed via the discrete-time algebraic Riccati equation, with hard saturation applied for actuator limits (Hose et al., 16 Jan 2026, Geist et al., 2022).
Approximate Nonlinear Model Predictive Control (AMPC): A nonlinear optimal control problem is solved offline (e.g., direct collocation, IPOPT solver), then distilled into a feed-forward neural network for deployment at 1 kHz. The controller tracks position and orientation references while satisfying state/input constraints and capturing full-body nonlinearities and precession effects (Hose et al., 16 Jan 2026, Hose et al., 7 Feb 2025).
Bayesian Optimization (BO) for State-Feedback Tuning: Gains for a static state-feedback controller are optimized directly on hardware, using GPyTorch surrogate models and expected improvement acquisition over an 8-D parameter space. Optimization converges substantially within 60 episodes and improves task reward relative to manual or random search (Hose et al., 7 Feb 2025).
Reinforcement Learning (RL): Proximal Policy Optimization (PPO) is used to jointly minimize crash penalties, control effort, and trajectory errors; policies are first trained in simulation and then fine-tuned on the real robot, achieving high reliability (crash rate <5% over 100 laps) and sub-12s median lap times in racing benchmarks (Hose et al., 16 Jan 2026).
Hybrid Finite-State Machines: Stand-up and recovery maneuvers employ mode logic—open-loop torque bursts for rapid self-erection or flipping, followed by seamless transition to feedback control when in a stabilizable domain (Hose et al., 7 Feb 2025, Hose et al., 16 Jan 2026).

5. Experimental Performance and Data Resources

Extensive empirical evaluation covers balancing, disturbance rejection, self-erection, racing, and surface adaptation. Performance benchmarks for the Mini Wheelbot are as follows (Hose et al., 16 Jan 2026):

Disturbance rejection: Lateral impulse (2 N·s) yields ≤11° maximum roll deviation, 0.25 ± 0.05 s rise time, <0.5° steady-state error.
Self-erection: Stand-up from random initial orientations reliably within 0.42 ± 0.05 s, with peak reaction-wheel speeds ~280 rad/s.
Tracking control: PRBS experiments yield φ_RMS ≈ 0.02 rad, θ_RMS ≈ 0.03 rad; AMPC driving produces <5% track deviation at 0.95 m/s average speed.
Learning benchmarks: Dynamics models trained on provided dataset achieve prediction MSE = 0.005 rad² (φ) on 1 s rollouts; Transformer time-series classifier distinguishes surface types with 85–92% accuracy.

The open-source “wheelbot-dataset” (Hose et al., 16 Jan 2026) comprises 383 trajectories, 13 million state transitions, and synchronized onboard–Vicon measurements at up to 1 kHz.

6. Modeling Friction, Contact, and Nonholonomy

Rolling contact without slip is enforced through the choice of generalized coordinates, eliminating explicit slip states. Frictional effects, especially in yaw, are captured via smooth hyperbolic tangent friction models: $h_b$ 2. Contact switches (e.g., stand-up flips) are handled by inverting the contact frame and zeroing state estimation, avoiding discontinuities in the continuous ODE description (Hose et al., 7 Feb 2025).

Discrete events including flipping, landing, and crash recovery, which pose challenges for hybrid-system modeling, are integrated in a modular fashion: mode transitions conditionally reinitialize estimators and controllers while logging experimental data for further analysis or learning (Hose et al., 7 Feb 2025).

7. Significance and Research Implications

Quasi-symmetric balancing reaction wheel unicycles provide an accessible yet deeply challenging platform for research in nonlinear control, system identification, state estimation, and reinforcement learning. Their open-source implementations, high-fidelity datasets, and diverse control benchmarks have facilitated accelerated progress in algorithmic evaluation, comparability of learning-based and classical methods, and reproducibility (Hose et al., 16 Jan 2026, Hose et al., 7 Feb 2025, Geist et al., 2022).

The quasi-symmetric architecture delivers a robust compromise between manufacturing simplicity and dynamic separability, readily supporting scalable experiments across different hardware instances and environments. This paradigm underpins emerging methodologies for fast online adaptation, safe reinforcement learning, and data-driven dynamics modeling in underactuated, contact-rich domains.

References:

(Hose et al., 16 Jan 2026): The Mini Wheelbot Dataset: High-Fidelity Data for Robot Learning (Hose et al., 7 Feb 2025): The Mini Wheelbot: A Testbed for Learning-based Balancing, Flips, and Articulated Driving (Geist et al., 2022): The Wheelbot: A Jumping Reaction Wheel Unicycle