LED State Classification Advances

Updated 29 September 2025

LED state classification is the process of detecting and interpreting LED signals, crucial for industrial monitoring, optical wireless communication, and robotic localization.
Deep learning architectures such as YOLOv2, modified AlexNet, and FCNs achieve high accuracy—with detection rates above 99% and classification accuracy up to 99.9%—under optimal conditions.
Integrating LED state prediction enhances automation by improving spectral efficiency, energy utilization, and enabling self-supervised learning in dynamic environments.

LED state classification refers to the process of detecting, discriminating, and interpreting the operational states of light-emitting diodes (LEDs) as they appear in visual, communications, or robotic applications. This field encompasses methods for classifying signal lights in industrial environments, decoding modulation schemes in optical wireless communication, and learning robot localization or pose through pretext tasks that exploit controllable LED indicators. Recent research demonstrates the critical role of LED state classification in enabling reliable automation, robust communication, and scalable self-supervised learning.

1. Visual Detection and Classification of Signal Lights

LED state classification in industrial environments centers on detecting stack lights that encode machine states. In the canonical approach for factory floors (Nilsson et al., 2020), a two-stage deep learning pipeline is utilized:

Pre-processing: High-resolution video frames (QHD, 2560×1440) are resized and partitioned into sub-images compatible with a YOLOv2 detector.
Detection: YOLOv2 localizes stack lights in the frame, resolving overlaps with non-maximal suppression.
State Classification: Detected regions are transformed to square crops and classified by a shallow, modified AlexNet into ten operational categories (e.g., green, yellow, red, lit in combination, and non-signal objects).

Performance metrics indicate robust operation: YOLOv2 achieves a detection rate of ~99.2% with negligible false positives (<0.1%), while the classifier attains up to 99.9% accuracy (using SGDM) on the Stack Light Classification Dataset. Classification resilience degrades with extreme image perturbations (e.g., blurring, strong gamma correction, severe size reduction), pinpointing limitations for generalization in dynamic settings.

The pipeline's adaptivity and non-invasive nature support retrofitting in legacy environments, facilitate automated monitoring, and align with Industry 4.0 integration strategies. Anticipated directions include augmentation for broader scene variation and fusion of temporal signal cues to mitigate state misclassification during transient/fading periods.

2. LED State Classification in MIMO Optical Wireless Communication

Flexible LED Index Modulation (FLIM) introduces an information-theoretic extension of classical LED state classification for MIMO optical wireless channels (Yesilkaya et al., 2022). Unlike traditional SMX (all LEDs active, high complexity) or SM (single LED active, reduced efficiency), FLIM generalizes the symbol space by admitting variable activation patterns—including the “off” state as a symbol:

Transmitter Alphabet: Each LED can assume zero (off) or one of M unipolar PAM intensity values, expanding the alphabet to (M+1) states.
Spectral Efficiency: Achieved efficiency is formulated as $\eta_\mathrm{FLIM} = \lfloor N_t \log_2(M+1)\rfloor$ , where $N_t$ is the number of LEDs.
Receiver Architecture: Detection complexity is held linear via a minimum mean squared error (MMSE) equalizer, with angle-perturbed photodiode arrays to ensure channel matrix full rank and reliable demodulation.

FLIM achieves bit error rate improvements of 6–11 dB over GSM-II and SMX in moderate–high efficiency regimes. Leveraging the “off” state for encoding improves energy efficiency and enables adaptive transmission. The approach is particularly well-suited for LiFi, smart attocell, and device-to-device communication, where both illumination and data transmission share the same physical infrastructure.

3. Self-Supervised Robot Localization via LED State Prediction

LED state classification has become a foundation for self-supervised visual localization in robotics (Nava et al., 15 Feb 2024, Carlotti et al., 6 Oct 2024, Carlotti et al., 12 Sep 2025). These approaches systematically exploit the binary on/off state of robot-mounted LEDs—as an auxiliary/pretext task—during neural network training:

Pretext Task: Networks are trained to predict LED states (typically binary per LED), directly from monocular camera images. Labels are generated autonomously during data collection by toggling LEDs.
Joint Learning: Models predict not only LED states but also position, bearing, and (in advanced variants) distance and heading. Outputs typically include spatial belief maps and angle maps, with loss functions combining binary cross-entropy for LED state estimation and regression or spatial attention for localization.
Spatial and Visibility Weighting: Loss terms are modulated by projection maps (robot presence confidence) and angular visibility weighting, e.g., shifted cosine functions per LED with respect to predicted robot orientation.

Empirical results demonstrate:

Approach	Median Position Error (px)	Heading MAE (°)	Tracking Error (cm)
Self-supervised (LED states)	14.5	17	4.2
Fully supervised (upperbound)	10.1	8.4	11.9

Precision on position estimation is significantly enhanced compared to direct supervised learning or alternative self-supervised methods (e.g., autoencoders), with reduced labeled data requirements. The LED state prediction is used strictly at training time; inference operates on visual cues alone.

4. Model Architectures and Training Strategies

All advanced approaches utilize Fully Convolutional Networks (FCN), producing output maps for:

LED state (per LED)
Localization (presence/projection map)
Orientation/Bearing (angle map)
Distance estimation (in multi-scale settings, e.g., input at different scales)

Loss computation is spatially aware: only image regions plausibly containing the robot (based on projection map) and visible LEDs (based on their expected orientation) contribute meaningfully. Multi-scale normalization, leveraging calibration images, permits depth estimation from 2D monocular data.

Training mandates knowledge of LED viewing direction and calibration for distance ambiguity, but circumvent requirements for pose or appearance labels and external localization infrastructure. This feature is crucial for unsupervised or weakly supervised deployment in novel domains, facilitating generalization to new environments.

5. Application Domains and Implications

LED state classification spans three principal domains:

Industrial Monitoring: Enables indirect acquisition of machine states by visual means, especially in legacy systems devoid of connectivity or built-in sensors (Nilsson et al., 2020).
Optical Wireless Communication: Expands the encoding/decoding capabilities of spatial modulation systems, yielding enhanced spectral and energy efficiency (Yesilkaya et al., 2022).
Robotic Self-supervision: Serves as a scalable pretext for learning visual representations that are position- and orientation-aware, minimizing the need for human supervision and enabling fast, accurate relative localization even in the absence of explicit pose labels (Nava et al., 15 Feb 2024, Carlotti et al., 6 Oct 2024, Carlotti et al., 12 Sep 2025).

Common to all is the leveraging of a simple, inherent physical cue—the LED state—as a robust supervisory signal. In robotics, this supports multi-agent tracking, heading estimation, and pose prediction in GPS-denied or resource-constrained settings.

6. Challenges, Limitations, and Future Directions

Key challenges in LED state classification include:

Visual Perturbations: Accuracy degrades under severe blur, reduced resolution, occlusion, or extreme lighting variability (Nilsson et al., 2020, Nava et al., 15 Feb 2024).
Limited Diversity of Training Data: Constrained environments can reduce generalization; models are sensitive to unseen background clutter or occlusion scenarios.
LED Visibility and Occlusion: Only certain LEDs are visible depending on viewing angle; weighting mechanisms must robustly account for this.
Scale and Depth Estimation: Multi-scale processing increases computational cost, with distance estimation limited by the discretization granularity (Carlotti et al., 12 Sep 2025).

Future work is oriented toward:

Generalization: Training on more diverse datasets and deploying domain adaptation for visual appearance and environmental variability.
Model Extension: Moving from 2D to full 3D pose estimation (e.g., adding more LEDs or modifying visibility functions).
Temporal Consistency: Incorporating sequential information (e.g., video, optical flow) to reinforce pose predictions.
Model Efficiency: Optimizing for deployment on constrained hardware and expanding to multi-agent detection in large-scale settings.
Alternative Pretext Tasks: Investigating other actuator states that may provide similar supervisory signals for vision-based perception.

A plausible implication is that LED state classification—by virtue of its universality and low annotation cost—may serve as a general paradigm for self-supervised learning of spatial and physical reasoning tasks in vision, communication, and robotics.