Papers
Topics
Authors
Recent
2000 character limit reached

Closed-Loop Sensor-AI-Robot Framework

Updated 15 November 2025
  • Closed-loop Sensor-AI-Robot Framework is a system where real-time sensor data, AI algorithms, and robot actuators interact continuously via explicit feedback loops.
  • It employs modular components for head-pose estimation, blink detection, and emotion recognition to ensure precise imitation and rapid dynamic corrections.
  • Quantitative performance metrics (yaw R² = 98.9%, pitch R² = 96.3%, and <200 ms latency) highlight its capability for robust, adaptive human-robot interaction.

A closed-loop Sensor–AI–Robot framework is a system in which real-time sensor data are continuously ingested, processed by AI algorithms, and fed into robotic actuators, with explicit feedback mechanisms enabling dynamic adjustment and correction. Such frameworks are essential for robust human-robot interaction, precision imitation, and adaptive control. The implementation described by "Real-Time Imitation of Human Head Motions, Blinks and Emotions by Nao Robot: A Closed-Loop Approach" (Rayati et al., 28 Apr 2025) exemplifies this methodology, using modular, feedback-driven components for real-time perception and control of a humanoid robot.

1. System Composition and Block Structure

The integrated system is architected as a series of tightly-linked modules:

  • Sensor Module: High-frame-rate webcam/camera (30 fps RGB) streams live video to the AI processing pipeline.
  • AI Processing Module (Python 3.10):
    • Head-Pose Estimator: MediaPipe Pose detects facial landmarks; SVR fits robot joint constraints.
    • Blink Detector: MediaPipe Face-Mesh computes eye aspect ratios.
    • Emotion Recognizer: DeepFace CNN outputs framewise emotion probabilities, aggregated by majority-vote.
  • Robot Control Module (Naoqi Python 2.7):
    • Joint Mapping: Yaw and pitch angles are mapped to HeadYaw and HeadPitch actuators via inverse kinematics.
    • Blink Actuation: LED control simulates blink events.
    • Speech and Feedback: Text-to-speech triggered by emotion; actual joint angles read and returned for error analysis.
  • Feedback Analysis Module: Computes real-time error metrics (notably, R2R^2 for pose tracking) and dynamically adjusts AI module parameters (thresholds, smoothing gains) in response.

Data flow: Sensors → AI modules → Robot controller → Feedback analyzer → (cycle).

2. Perceptual and AI Algorithms

Head-Pose Estimation

MediaPipe Pose provides 3D facial landmarks:

  • Define Pleft,PrightR3P_{left}, P_{right} \in \mathbb{R}^3 as left/right eye centers.
  • Compute normalized eye vector V^eye=(PleftPright)/PleftPright\hat{V}_{eye} = (P_{left}-P_{right}) / \|P_{left}-P_{right}\|.
  • Yaw angle: φ=arccos(V^baselineV^eye)\varphi = \arccos(\hat{V}_{baseline} \cdot \hat{V}_{eye}), rotation axis a=(V^baseline×V^eye)/a = (\hat{V}_{baseline} \times \hat{V}_{eye})/\| \cdot \|, combined to rotation vector r=φar = \varphi a. Translate to Euler angles, clamp to [119.5,119.5][-119.5^\circ, 119.5^\circ].
  • Pitch: Use nose point PnoseP_{nose}, compute V^new=(PnoseMeye)/PnoseMeye\hat{V}_{new} = (P_{nose} - M_{eye}) / \|P_{nose}-M_{eye}\|, with Meye=(Pleft+Pright)/2M_{eye} = (P_{left}+P_{right})/2.
  • Clamp pitch value via two support vector regression (SVR) models fit to Nao robot joint limits, enforcing a 5% safety margin.

Extract four mesh points per eye, compute vertical (dinnerd_{inner}) and horizontal (douterd_{outer}) distances:

  • Eye aspect ratio R=douter/dinnerR = d_{outer}/d_{inner}.
  • Empirical threshold TT: If R>TR > T, declare blink (Closed); else Open.

Emotion Recognition

Framewise DeepFace CNN outputs 7-class scores s=[s1,,s7]s=[s_1,\ldots,s_7]; emotions are {anger, disgust, fear, happiness, sadness, surprise, neutral}.

  • Apply Softmax: pi=esi/jesjp_i = e^{s_i}/\sum_j e^{s_j}.
  • Aggregate sliding window (N=10N=10): E=mode{argmaxipi(t),t=1,...,10}E = \text{mode}\{ \arg\max_i p_i(t), t=1,...,10 \}.

3. Closed-Loop Feedback and Dynamic Correction

After issuing commands (φcmd,ψcmd)(\varphi_{cmd}, \psi_{cmd}), robot reports (φact,ψact)(\varphi_{act}, \psi_{act}) each cycle. Over TT frames, system computes R2R^2 metric:

R2=1t=1T(yty^t)2t=1T(ytyˉ)2R^2 = 1 - \frac{ \sum_{t=1}^T (y_t - \hat{y}_t)^2 }{ \sum_{t=1}^T (y_t - \bar{y})^2 }

where yty_t is estimated (human) angle, y^t\hat{y}_t is actual robot angle, and yˉ\bar{y} is mean.

If R2R^2 drops below 0.95 threshold:

  • Increase low-pass filter gain on the commanded angles (φcmd,ψcmd\varphi_{cmd}, \psi_{cmd}) to suppress jitter.
  • Tighten blink threshold TT if false blinks detected.

4. Communication, Data Flow, and Real-Time Constraints

  • Transport: AI and Naoqi API communicate via HTTP (JSON payloads).
  • Payloads: {yaw,pitch,blinkL,blinkR,emotion}\{\text{yaw}, \text{pitch}, \text{blinkL}, \text{blinkR}, \text{emotion}\}; robot responds with {yawact,pitchact,timestamp}\{\text{yaw}_{act}, \text{pitch}_{act}, \text{timestamp}\}.
  • Frame rates: 30 fps image acquisition; end-to-end imitation/feedback loop yields \sim25 fps.
  • Latency: 80–120 ms round-trip average.

5. Actuation: Motor and Signal Mapping

  • Joint Mapping: Commands to HeadYaw.setAngle(φcmd\varphi_{cmd}), HeadPitch.setAngle(ψcmd\psi_{cmd}).
  • Blink: LED toggling on EyeLEDs over 200 ms upon detected blink event.
  • Stability: First-order low-pass filter on angles:

φk+1=αφcmd+(1α)φk,α[0.5,0.9]\varphi_{k+1} = \alpha \varphi_{cmd} + (1-\alpha) \varphi_k, \quad \alpha \in [0.5, 0.9]

to ensure smooth transitions, suppress spurious corrections.

6. Quantitative Performance and Experimental Results

  • Yaw tracking R2=0.989R^2 = 0.989 (98.9%).
  • Pitch tracking R2=0.963R^2 = 0.963 (96.3%).
  • Blink imitation success: 96% (48/50 trials).
  • Emotion-to-response latency: \leq200 ms per frame; text-to-speech \approx300 ms.
  • Closed-loop update rate: 25 Hz.

7. Context, Applications, and Significance

The described framework supports high-fidelity, robust real-time imitation of human head pose, blink gestures, and emotional state by a Nao humanoid robot. Its closed-loop design ensures rapid correction and low error accumulation, critical for applications in human-robot interaction, notably for autism communication support.

The accuracy metrics (R² \approx 99% yaw / 96% pitch, <<0.2 s latency) are achieved via careful sensor-AI-actuator integration and dynamic, feedback-driven correction. The modularity inherent to this architecture can be generalized to other real-time interactive robots using similar sensor and AI stacks (e.g., MediaPipe, DeepFace).

A plausible implication is that integrating such feedback-driven AI modules into assistive robot platforms can significantly elevate the naturalism and responsiveness of robot social behaviors. These design principles—continuous sensing, multi-stage AI processing, real-time feedback, and transparent control algorithms—constitute the technical substrate for next-generation closed-loop embodied agents in interactive settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Closed-Loop Sensor-AI-Robot Framework.