UniCycle Benchmark: Wheelbot Evaluation

Updated 24 March 2026

UniCycle Benchmark is a suite of standardized evaluation tasks that assess upright recovery, push-rejection, repeated actuation, and energy-efficient balance on the Wheelbot.
It employs quantitative metrics such as recovery time, accuracy, robustness, and energy usage to deliver reproducible scores for controller performance in non-holonomic robotic systems.
Grounded in real hardware measurements, the benchmark facilitates comparative studies of control, sensor fusion, and estimation methods for under-actuated robotic platforms.

The UniCycle Benchmark is a suite of standardized evaluation tasks and metrics introduced in the context of the Wheelbot, a symmetric reaction wheel unicycle system capable of self-erection and robust balancing. Designed to quantify control and estimation performance for non-holonomic, under-actuated robotic platforms, the UniCycle Benchmark provides reproducible, rigorous criteria for assessing algorithms in upright recovery, disturbance rejection, repeated actuation, and energy-aware balancing. The benchmark is grounded in real hardware, with all parameters and specifications directly measured or computed on the physical Wheelbot platform (Geist et al., 2022).

1. Standardized Tasks and Definitions

The UniCycle Benchmark consists of four core tasks, each targeting a critical aspect of combined nonlinear control and estimation for the Wheelbot system architecture:

Task A — Upright Recovery: Starting from an initial tilt $(q_1, q_2) \in [-\theta_\text{max}, \theta_\text{max}]^2$ , the system must recover to $|q| < \delta$ within $T_A$ seconds, with $\theta_\text{max} = 30^\circ$ , $\delta = 3^\circ$ , and $T_A = 0.5$ s.
Task B — Push-Rejection: From the balanced upright ( $q = 0$ ), the system is subjected to applied torque disturbances $|\tau| \leq \tau_\text{max}$ about roll or pitch axes. The requirement is to return to $|q| < \delta$ within $T_B$ seconds ( $\tau_\text{max} = 1$ N·m, $\delta = 2^\circ$ , $T_B = 1$ s).
Task C — Repeated Self-Erection: The system must perform $N \geq 20$ consecutive cycles of stand-up followed by balance without failure.
Task D — Energy-Efficient Balance: The goal is to maintain upright balance for $T_D = 300$ s while minimizing the integrated control input $\int |u|^2 dt$ .

These tasks are parameterized by directly observed system performance on the Wheelbot, ensuring that the benchmark domains are physically realizable.

2. Performance Metrics and Scoring

Performance on each task is quantified using a normalized score $S \in [0,1]$ defined as a weighted sum of speed, accuracy, robustness, and efficiency components:

$S = w_1 \cdot S_\text{time} + w_2 \cdot S_\text{accuracy} + w_3 \cdot S_\text{robust} + w_4 \cdot S_\text{energy}$

For Task A (Upright Recovery), the specific components are defined as:

$S_\text{time} = \max\left(0, 1 - \frac{T_{A,\text{actual}}}{T_{A,\text{spec}}}\right)$
$S_\text{accuracy} = \exp\left(-\frac{\|q_\text{final}\|_2}{\delta}\right)$
$S_\text{robust} =$ success rate over $M$ trials
$S_\text{energy} = 1 - \frac{E_\text{spent}}{E_\text{max}}$

Suggested weights for a balanced evaluation are $w_1 = 0.4$ (speed), $w_2 = 0.3$ (accuracy), $w_3 = 0.2$ (robustness), and $w_4 = 0.1$ (efficiency).

A similar aggregation formula and weighting apply for Tasks B–D, taking into account the relevant metric definitions (e.g., settling time and excursion for push-rejection, cycle count for repeated self-erection, energy usage for balance).

3. Hardware and Dynamic Foundations

The benchmark is intrinsically linked to the physical and dynamic parameters of the Wheelbot platform:

Quantity	Value	Notes
Total mass ( $m_\text{total}$ )	$1.4$ kg
Reaction wheel inertia ( $I_\text{wheel}$ )	$5 \times 10^{-4}$ kg·m²	Measured, copper-ring stack
Wheel outer radius ( $r_w$ )	$106$ mm	$a - 4$ mm, with $a = 110$ mm
Max continuous torque ( $\tau_\text{max}$ )	$1.3$ N·m	@ $I = 18$ A
Encoder resolution	$0.088^\circ$ /count	Optical, 12-bit over $360^\circ$
IMU noise (gyro/accel)	$0.01^\circ$ /s/√Hz / $100$ µg/√Hz	Four ICM-20948 units
Stand-up pivot offset ( $L_1, L_2$ )	$\approx 61$ mm	Chosen to minimize inversion torque
Runtime (full balancing hour)	$\approx 15$ Wh	Idle, no locomotion

All system dynamics, constraints, and tasks are defined with respect to these measured characteristics, ensuring direct transferability of observed results and facilitating reproducibility.

4. Data Collection and Experimental Protocols

Protocol details are specified for each benchmarked behavior:

Self-Erection Trials: Body placed on any face or edge; tested over $\geq 20$ trials with $100\%$ success; time-to-erect $0.45 \pm 0.05$ s.
Roll-Up Trials: Flat starting pose; $20/20$ successes on acrylic or rubber; $0.6 \pm 0.07$ s to completion.
Disturbance Rejection: Impulsive torques up to $1$ N·m applied at midheight; maximum tilt $\pm 6^\circ$ ; settling time to $\pm 2^\circ$ is $0.8 \pm 0.1$ s; peak currents $13$ A.
Balance Performance: Steady-state roll/pitch error $<0.5^\circ$ ; overshoot $<3^\circ$ to step inputs; repeatability dispersion $<0.4^\circ$ .

Energy consumption, repeatability, and robustness are thus empirically interrogated under conditions matching those formalized in the benchmark.

5. Significance and Applications

The UniCycle Benchmark formalizes a measurement-driven, task-oriented procedure for evaluating control, estimation, and data-driven algorithms in under-actuated, non-holonomic robotic systems exemplified by the Wheelbot. By encompassing upright recovery, robust disturbance rejection, repeated operation reliability, and energy sensitivity, it provides a high-fidelity testbed for:

Comparative studies of nonlinear and linear (e.g., LQR) controllers under real-world disturbances and constraints.
Assessment of sensor fusion and state estimation methods, given high sensor noise and drift typical for IMUs and encoders.
Testing of new estimation frameworks (e.g., complementary, EKF-based) with ground-truth experimental success/failure.
Integrated metrics that bridge time-response, precision, robustness, and energy efficiency, reflecting the requirements of practical mobile robotic platforms.

The design of the tasks and metrics is such that any improvements or new algorithms can be benchmarked in direct, quantitative, and hardware-constrained fashion, with the standardization advancing comparability and reproducibility within the robotics and control research communities (Geist et al., 2022).

6. Constraints, Limitations, and Interpretation

The benchmark is bounded by the physical limitations of the Wheelbot hardware: actuator saturations ( $\tau_\text{max}$ ), battery voltage, encoder resolution, sensor noise, and frame geometry. Reaction wheel and rolling wheel motors and control currents are limited to hardware specifications, while sensor fusion is restricted to practical IMU noise characteristics. All protocol results are conditioned on the exclusion of exceptionally high-frequency dynamics or unmodeled noise; adjustments for these are available as tuning parameters for individual implementations.

A plausible implication is that, while the UniCycle Benchmark is tightly coupled to a particular system instantiation, its formulation and task design principles can inform benchmarks for similar classes of under-actuated, mobile robots. However, direct score comparison is valid only when measurement and hardware constraints match those documented for the Wheelbot (Geist et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

The Wheelbot: A Jumping Reaction Wheel Unicycle (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UniCycle Benchmark.