Multi-Modal EPW Control System
- The multi-modal EPW control system is a platform integrating joystick, speech, gesture, and EOG interfaces to provide adaptive, safe mobility for users with movement impairments.
- It employs a four-layer architecture that fuses sensor data, real-time processing, and cloud connectivity to ensure continuous monitoring and rapid response to hazards.
- It leverages data-driven LMPC and embedding-based methods to optimize input arbitration, enabling robust predictive control under dynamic conditions.
A Multi-Modal EPW (Electric-Powered Wheelchair) Control System integrates diverse user-intent interfaces and advanced supervisory logic to enable robust, adaptive, and clinically compliant mobility for individuals with significant movement impairments. The term "multi-modal" indicates concurrent support for heterogeneous control channels—joystick, speech, hand gesture, and electrooculogram (EOG)—uniquely prioritized via arbitration algorithms for seamless, context-sensitive operation. State-of-the-art implementations also incorporate medical-grade biophysical monitoring, data-driven predictive control frameworks, and rigorous safety mechanisms consistent with ISO and IEC standards (Hossain et al., 6 Jan 2026).
1. Layered System Architecture and Core Components
The system architecture adheres to a four-layer hierarchy: Sensing, Processing, Communication, and IoT/Cloud (see Fig. 1 of (Hossain et al., 6 Jan 2026)).
- Sensing Layer: Captures multimodal user inputs and physiological signals.
- Control interfaces: Analog joystick (X–Y potentiometers + push-button), speech via smartphone microphone, hand-gesture via glove-mounted ADXL345 accelerometer, EOG using LM358N-based signal acquisition.
- Biophysical sensors: MAX30100 (SpO₂/HR, 18-bit ADC), DS18B20 (skin temp), ADXL345 (fall/convulsion).
- Processing Layer:
- NodeMCU ESP32 microcontroller (dual-core, 240 MHz, FreeRTOS, 520 kB SRAM).
- Sensor buses: I²C (MAX30100, DS18B20), SPI/I²C (IMU), dual 12-bit ADCs (for analog joystick/EOG).
- L298N dual H-bridge motor driver for actuation.
- Communication Layer: Transmits sensor and control data.
- Wi-Fi (IEEE 802.11 b/g/n): Cloud uplink (ThingSpeak server).
- BLE (AES-128/CCM encryption): Low-latency link with Android caregiver app.
- IoT/Cloud Layer:
- Time-series logging, dashboards, and real-time caregivers alerts via Android (Thunkable) app.
- Cloud-based vital-sign monitoring, secure alerting (SMS/email, in-app).
Block interconnection (ASCII schematic as per (Hossain et al., 6 Jan 2026)):
1 2 3 4 5 6 |
[Joystick] ─┐
[Speech] ─┼─> ESP32 ─> L298N ─> [Motors/Wheels]
[Gesture] ─┤
[EOG] ─┘
│ ├─> Wi-Fi ─> Cloud/ThingSpeak ─> Android App
└─> BLE ─────────────────────────────┘ |
2. Signal Processing, Feature Extraction, and Interface Logic
- Joystick: Analog [0–3.3 V] mapped to 12-bit digital; dead-zone elimination and linear scaling yield PWM duty cycle for motor commands (50 Hz).
- Speech: Android’s built-in ASR restricts input to {forward, back, left, right, stop}, transmitted over BLE for decoding and actuation.
- Gesture: ADXL345 tilt data processed within ±200 ms window; threshold-based detection maps to navigation actions.
- EOG: Differential electrode placement, LM358N amplification, band-pass filtering (0.1–35 Hz), feature-detection of sustained horizontal/vertical deviations (>12° for ≥2 s) and double-blink for "stop".
No deep learning classifiers are deployed; control relies on fixed feature extraction and rule-based logic. Future directions include convolutional architectures and SVM/CNN for non-analog modalities.
3. Mode Arbitration, Safety Logic, and Real-time Control
Control prioritization leverages a "priority-ladder" scheme:
- Hazard-first logic: If
FallFlag ∨ HealthAlert ∨ ObstacleFlagis true, the system executesSafeHalt→Stopimmediately. - Mode arbitration: Retains last-used non-hazardous mode or transitions per channel availability and command validity.
- User-mode manual selection: Four dedicated push-buttons.
- Fixed-step control loop: 50 Hz (20 ms), managed with FreeRTOS task scheduling—sensor polling, arbitration, PWM updates.
Latency analysis:
| Source | Value (ms) |
|---|---|
| Sensor ADC | ≈0.3 |
| ESP32 processing | ≈0.5 |
| Motor driver settling | ≈0.2 |
| Wi-Fi (health uplink) | ≈4 |
| BLE encryption overhead | 0.004 |
Aggregate closed-loop latency (voice/gesture/EOG → actuation): 20 ± 0.5 ms. System draws ≈8.4 W at 24 V (≈350 mA); BLE encryption overhead <1%. Battery runtime: >10 h (5 Ah pack).
4. Calibration, Biophysical Monitoring, and Cloud Alerting
- Biophysical sensor calibration utilizes two-point referencing:
- MAX30100 vs. ISO 80601-2-61 pulse-ox simulator
- DS18B20: ice-water (0 ℃), boiling-water (100 ℃)
- ADXL345: static ±1g testing
Root-mean-square errors:
- Heart rate: ≤2 bpm (#samples N=80; mean bias ≈0.2 bpm)
- SpO₂: ≤1% (mean bias –0.3%)
- Temp: ≤0.5 ℃ (mean bias +0.05 ℃)
- Cloud telemetry: ESP32 aggregates and pushes SpO₂, HR, temp, and fall-state every 1 s to ThingSpeak (Wi-Fi). BLE provides fallback/local streaming. Alerts for HR>140 bpm/<40 bpm, temp>38.5 ℃, SpO₂ < 90% are issued as SMS/email (SMTP) and in-app indicators.
- ISO/IEC compliance: Sensor front-end (ISO 80601-2-61), system safety (ISO 7176-31), medical alarm compatibility (IEC 80601-2-78). Critical risk mitigations include watchdog timer, latched emergency stop, battery protection.
5. Data-Driven and Embedding-Based Multi-Modal Control Methodologies
Two advanced paradigms generalize multi-modal EPW control for dynamic and uncertain environments:
5.1 Data-Driven Multi-Modal LMPC
- Affine Time-Varying (ATV) Modeling: Learns local dynamics from historical trajectories sampled across "modes" (e.g., friction variants of floor material: carpet/tile/pavement) (Kopp et al., 2024).
- Sampling Safe Sets: Constructs a convex hull from nearest prior feasible states, ensuring recursive feasibility via tube-based constraint tightening.
- LMPC Optimization: At each instant, leverages ATV-identified models and convex safe sets to solve for optimal input sequences:
- Mode adaptation: Pseudo-real-time update via selection of nearest data neighbors, with robust fallback (LQR) in case of insufficient local data.
5.2 Embedding Methods for Switched Optimal Control
- Binary embedding: Encodes each of modes as binary variables replacing the discrete switching law by continuous relaxed variables (Sakha et al., 14 Dec 2025).
- Embedded dynamics/costs: Constructs the weighted sum of subsystem dynamics (and costs) using mode indicator polynomials (see original for explicit form).
- Concave auxiliary penalty: Forces bang–bang minimizers of the relaxed embedding and excludes invalid mode bitstrings, via:
- Application to EPW: For example, (idle, fwd, rev, turn-L, turn-R), (bits); direct collocation solvers guarantee boundary (binary) optimal schedules implementable directly on physical wheelchair platforms.
6. Experimental Performance, Safety, and Clinical Outcomes
- Recognition accuracy (20 participants, commands):
| Modality | Mean Accuracy (95% CI) |
|---|---|
| Joystick | 99% (±0.5%) |
| Speech | 97% (±2%) |
| Gesture | 95% (±3%) |
| EOG | 96% (±2.5%) |
- Biophysical sensor validation: Pearson correlation and Bland-Altman limits confirm medical-grade precision (HR , Temp , SpO₂ ; see source for detailed plots).
- System endurance: >10 h per charge with all input modalities and telemetry enabled.
- Safety outcomes: Emergency stop and real-time alerts enabled by cloud-integrated sensing and arbitration logic.
This architecture comprehensively addresses accessibility, adaptability, and clinical oversight, and lays the foundation for future semantic intent prediction, semi-autonomous navigation (SLAM), and further autonomy/power-optimization enhancements (Hossain et al., 6 Jan 2026).
7. Limitations and Prospects for Future Development
- Algorithmic extensibility: Present implementations utilize rule-based classification; future work is directed at integrating convolutional attention modules (CNN/CBAM for vision) and SVM/CNN for EOG/gaze decoding.
- Predictive control: Adoption of data-driven LMPC and embedding-formulation SOCP frameworks offers robust performance under model uncertainty and can accommodate mixed-discrete (mode) and continuous control objectives (Kopp et al., 2024, Sakha et al., 14 Dec 2025).
- Safety/standards: Platform aligns with ISO 7176-31 and IEC 80601-2-78, continually adapting to evolving risk profiles through cloud-based analysis and machine learning.
- Clinical scalability: Initial results verify command accuracy above 95% for varied modalities; large-scale longitudinal studies and adaptive intent modeling are required for broader deployment.
A plausible implication is the future integration of high-dimensional sensor fusion, predictive health event analysis, and shared-control autonomy, capitalizing on the multi-modal system’s extensible architecture for both research and advanced clinical deployment.