UniCycle Benchmark: Wheelbot Evaluation
- UniCycle Benchmark is a suite of standardized evaluation tasks that assess upright recovery, push-rejection, repeated actuation, and energy-efficient balance on the Wheelbot.
- It employs quantitative metrics such as recovery time, accuracy, robustness, and energy usage to deliver reproducible scores for controller performance in non-holonomic robotic systems.
- Grounded in real hardware measurements, the benchmark facilitates comparative studies of control, sensor fusion, and estimation methods for under-actuated robotic platforms.
The UniCycle Benchmark is a suite of standardized evaluation tasks and metrics introduced in the context of the Wheelbot, a symmetric reaction wheel unicycle system capable of self-erection and robust balancing. Designed to quantify control and estimation performance for non-holonomic, under-actuated robotic platforms, the UniCycle Benchmark provides reproducible, rigorous criteria for assessing algorithms in upright recovery, disturbance rejection, repeated actuation, and energy-aware balancing. The benchmark is grounded in real hardware, with all parameters and specifications directly measured or computed on the physical Wheelbot platform (Geist et al., 2022).
1. Standardized Tasks and Definitions
The UniCycle Benchmark consists of four core tasks, each targeting a critical aspect of combined nonlinear control and estimation for the Wheelbot system architecture:
- Task A — Upright Recovery: Starting from an initial tilt , the system must recover to within seconds, with , , and s.
- Task B — Push-Rejection: From the balanced upright (), the system is subjected to applied torque disturbances about roll or pitch axes. The requirement is to return to within seconds ( N·m, , s).
- Task C — Repeated Self-Erection: The system must perform consecutive cycles of stand-up followed by balance without failure.
- Task D — Energy-Efficient Balance: The goal is to maintain upright balance for s while minimizing the integrated control input .
These tasks are parameterized by directly observed system performance on the Wheelbot, ensuring that the benchmark domains are physically realizable.
2. Performance Metrics and Scoring
Performance on each task is quantified using a normalized score defined as a weighted sum of speed, accuracy, robustness, and efficiency components:
For Task A (Upright Recovery), the specific components are defined as:
- success rate over trials
Suggested weights for a balanced evaluation are (speed), (accuracy), (robustness), and (efficiency).
A similar aggregation formula and weighting apply for Tasks B–D, taking into account the relevant metric definitions (e.g., settling time and excursion for push-rejection, cycle count for repeated self-erection, energy usage for balance).
3. Hardware and Dynamic Foundations
The benchmark is intrinsically linked to the physical and dynamic parameters of the Wheelbot platform:
| Quantity | Value | Notes |
|---|---|---|
| Total mass () | $1.4$ kg | |
| Reaction wheel inertia () | kg·m² | Measured, copper-ring stack |
| Wheel outer radius () | $106$ mm | mm, with mm |
| Max continuous torque () | $1.3$ N·m | @ A |
| Encoder resolution | /count | Optical, 12-bit over |
| IMU noise (gyro/accel) | /s/√Hz / $100$ µg/√Hz | Four ICM-20948 units |
| Stand-up pivot offset () | mm | Chosen to minimize inversion torque |
| Runtime (full balancing hour) | Wh | Idle, no locomotion |
All system dynamics, constraints, and tasks are defined with respect to these measured characteristics, ensuring direct transferability of observed results and facilitating reproducibility.
4. Data Collection and Experimental Protocols
Protocol details are specified for each benchmarked behavior:
- Self-Erection Trials: Body placed on any face or edge; tested over trials with success; time-to-erect s.
- Roll-Up Trials: Flat starting pose; $20/20$ successes on acrylic or rubber; s to completion.
- Disturbance Rejection: Impulsive torques up to $1$ N·m applied at midheight; maximum tilt ; settling time to is s; peak currents $13$ A.
- Balance Performance: Steady-state roll/pitch error ; overshoot to step inputs; repeatability dispersion .
Energy consumption, repeatability, and robustness are thus empirically interrogated under conditions matching those formalized in the benchmark.
5. Significance and Applications
The UniCycle Benchmark formalizes a measurement-driven, task-oriented procedure for evaluating control, estimation, and data-driven algorithms in under-actuated, non-holonomic robotic systems exemplified by the Wheelbot. By encompassing upright recovery, robust disturbance rejection, repeated operation reliability, and energy sensitivity, it provides a high-fidelity testbed for:
- Comparative studies of nonlinear and linear (e.g., LQR) controllers under real-world disturbances and constraints.
- Assessment of sensor fusion and state estimation methods, given high sensor noise and drift typical for IMUs and encoders.
- Testing of new estimation frameworks (e.g., complementary, EKF-based) with ground-truth experimental success/failure.
- Integrated metrics that bridge time-response, precision, robustness, and energy efficiency, reflecting the requirements of practical mobile robotic platforms.
The design of the tasks and metrics is such that any improvements or new algorithms can be benchmarked in direct, quantitative, and hardware-constrained fashion, with the standardization advancing comparability and reproducibility within the robotics and control research communities (Geist et al., 2022).
6. Constraints, Limitations, and Interpretation
The benchmark is bounded by the physical limitations of the Wheelbot hardware: actuator saturations (), battery voltage, encoder resolution, sensor noise, and frame geometry. Reaction wheel and rolling wheel motors and control currents are limited to hardware specifications, while sensor fusion is restricted to practical IMU noise characteristics. All protocol results are conditioned on the exclusion of exceptionally high-frequency dynamics or unmodeled noise; adjustments for these are available as tuning parameters for individual implementations.
A plausible implication is that, while the UniCycle Benchmark is tightly coupled to a particular system instantiation, its formulation and task design principles can inform benchmarks for similar classes of under-actuated, mobile robots. However, direct score comparison is valid only when measurement and hardware constraints match those documented for the Wheelbot (Geist et al., 2022).