LSP-YOLO: Lightweight Posture Recognition

Updated 24 November 2025

The paper introduces a novel single-stage architecture that unifies keypoint estimation and posture classification for efficient real-time performance.
It leverages a compact Light-C3k2 backbone with partial convolution and parameter-free SimAM attention to substantially reduce computational cost.
The model achieves state-of-the-art accuracy on desktop setups while maintaining practical deployability on low-power embedded devices.

LSP-YOLO is a lightweight, single-stage convolutional neural network architecture designed specifically for efficient sitting posture recognition on embedded edge devices. Developed as an end-to-end solution that unifies keypoint estimation and posture classification, LSP-YOLO targets real-time applications with severe constraints on computational resources, such as smart classrooms, rehabilitation platforms, and human–computer interfaces. Its core innovations include the introduction of a compact Light-C3k2 backbone featuring partial convolution and parameter-free SimAM attention, as well as a direct, pointwise keypoint-to-class mapping in the recognition head. LSP-YOLO achieves state-of-the-art classification accuracy, extremely high throughput on desktop hardware, and practical deployability on low-power processors (Li et al., 18 Nov 2025).

1. Model Architecture and Design Principles

LSP-YOLO builds on the backbone–neck–head paradigm established in YOLOv11-Pose, but with explicit fusion of pose estimation and posture classification in a single forward pass. The model structure is as follows:

Backbone: A stack of convolutional and Light-C3k2 modules replaces the conventional C3k2 blocks. The Spatial Pyramid Pooling Fast (SPPF) module enhances the receptive field for robust context capture.
Neck: A PANet-style multi-scale fusion merges shallow spatial with deep semantic features across three scales, enabling the network to capture multi-level information crucial for keypoint localization and pose inference.
Recognition Head (LSP-Head): For each output grid cell, the head jointly predicts confidence, regresses 11 upper-body keypoints, and classifies posture via a 1×1 convolution mapping the keypoint vector to six class logits, followed by a softmax.

Module Pipeline (Simplified)

1 2	Input → [Conv+Light‐C3k2]×n → SPPF → [Light‐C3k2 + up/down-sampling fusion] (neck) → LSP-Head → {confidence, keypoints, class-scores}

This single-stage approach eliminates the need for separate pose estimation pipelines, reducing both memory footprint and inference latency (Li et al., 18 Nov 2025).

2. Light-C3k2 Block: Partial Convolution and SimAM

The core backbone module, Light-C3k2, merges efficiency-centric and attention mechanisms:

Partial Convolution (PConv): Rather than convolving all $c$ channels, PConv applies standard convolution only to a fraction $r$ (set to 0.5), passing the remaining channels via identity. The computational savings for a $k\times k$ kernel are

$\text{Relative FLOP reduction} = 1 - r^2$

yielding 75% reduction per 3×3 conv when $r=0.5$ .

Similarity-Aware Activation Module (SimAM): A parameter-free attention mechanism, SimAM computes an importance energy for each neuron $t$ by optimizing

$E(w_t, b_t) = \frac{1}{M-1} \sum_{i\neq t} [-1 - (w_t x_i + b_t)]^2 + [1 - (w_t t + b_t)]^2 + \lambda w_t^2$

with attention score

$a_t = \mathrm{sigmoid}(1/e_t^*)$

acting channel-wise and location-wise.

Block Composition: Two Bottlenecks (k2), each starting with a PConv, followed by 1×1 convs for fusion, SimAM after each Bottleneck, and a residual connection encompassing both.

Light-C3k2 controls feature dimension via width multiplier $\alpha\in\{0.25, 0.5, 0.75, 1.0\}$ , maintaining representation power while substantially reducing GFLOPs and memory requirements (Li et al., 18 Nov 2025).

3. Recognition Head and Losses

The LSP-Head handles all prediction targets per output grid cell:

Keypoints-to-Class Mapping: The estimated keypoint vector $\hat K \in \mathbb{R}^D$ is transformed by a 1×1 conv to obtain six posture scores $s_i$ , with softmax normalization:

$S = \mathrm{Conv}_{1\times1}(\hat K), \quad \hat p_i = \frac{e^{s_i}}{\sum_{j=1}^6 e^{s_j}}$

Intermediate Supervision: Keypoint accuracy is enforced with an Object Keypoint Similarity (OKS) loss prior to class mapping, ensuring features support both regression and classification.

Loss Terms

Term	Loss Function	Purpose
Confidence	$L_{\mathrm{conf}} = \sum_{n=1}^{N}\mathrm{BCE}(p_{\mathrm{conf}}^n,t_{\mathrm{conf}}^n)$	Object presence
Keypoint	$L_{\mathrm{oks}} = 1 - \frac{\sum_i \exp(-d_i^2/(2 s^2 k_i^2))\,\delta(v_i>0)}{\sum_i \delta(v_i>0)}$	Keypoint accuracy
Classification	$L_{\mathrm{cls}} = \sum_{n=1}^N \mathrm{BCE}(p_{\mathrm{cls}}^n,t_{\mathrm{cls}}^n)$	Posture class accuracy

The aggregated loss is

$L_{\mathrm{sum}} = \alpha L_{\mathrm{conf}} + \beta L_{\mathrm{oks}} + \gamma L_{\mathrm{cls}}$

with $\alpha=1$ , $\beta=12$ , $\gamma=4$ (Li et al., 18 Nov 2025).

4. Dataset Construction and Augmentation

LSP-YOLO was trained and validated on a dedicated posture dataset with the following properties:

Images: 5,000, annotated for six upper-body posture classes: Correct, LeanLeft, LeanRight, ChinSupport, OnDesk, HeadDown.
Annotations: Each sample labeled with a bounding box (upper body), class, and 11 keypoints.
Partitioning: 70% training, 15% validation, 15% testing.
Augmentations: Random scaling, horizontal shift, HSV jitter, and random horizontal flips to bolster generalization (Li et al., 18 Nov 2025).

5. Training Process and Inference Results

Training: Conducted for 300 epochs with batch size 32, learning rate annealed from 0.01 to $1\mathrm{e}{-4}$ , image size 640×640, on AMD EPYC 7742 with dual RTX 3090 GPUs.
Model Variants: The smallest, LSP-YOLO-n ( $\alpha=0.25$ , depth=0.33), contains 1.9 M parameters and requires 4.2 GFLOPs per inference.
Performance on PC: Achieves 251 fps and 94.2% precision with a model size of 1.9 MB.
Embedded Deployment: On the SV830C + GC030A platform (0.5 TOPS, 64 MB RAM, 640×480@30 fps camera), the 8-bit quantized model yields:
- Preprocessing latency: 115 ms
- Inference latency: 255 ms (≈4 fps)
- Memory footprint: 22 MB
- Accuracy: 91.7%
- Model size: 2.2 MB

LSP-YOLO thus demonstrates both real-time throughput on desktop and practical, memory-constrained inference on edge accelerators (Li et al., 18 Nov 2025).

6. Computational Efficiency and Deployment Considerations

The model’s efficiency arises from design innovations:

GFLOPs Reduction: The combination of PConv (reducing the dominant convolutional cost to 25%) and SimAM (no extra parameters) leads to a total GFLOP reduction of approximately 15–20% compared to the baseline, with greater than 96% retention of classification precision.
Edge Compatibility: With model sizes near 2 MB, LSP-YOLO fits comfortably within the flash and RAM budgets of microcontroller- to mid-range systems.

Even the smallest variant consistently delivers 250 fps on high-end GPUs and approximately 4 fps on low-power hardware, validating its suitability for embedded applications (Li et al., 18 Nov 2025).

7. Applications, Limitations, and Directions for Further Research

Use Cases:

Multi-student posture monitoring in smart classrooms
Remote rehabilitation and posture correction feedback systems
Human–computer interfaces leveraging posture-based control signals

Identified Limitations:

Lower limb occlusion constrains reliable full-body posture estimation; expansion to 3D or multi-view sensing is a prospective solution.
The current design processes single frames; temporal fusion with video streams could augment robustness and consistency.
Scene-level multi-person counting is not addressed; future research can focus on dynamic keypoint grouping.
Incorporation of self-supervised pretraining on large, unlabeled posture datasets could improve robustness to real-world variation.

LSP-YOLO thus provides a high-efficiency, deployable baseline for posture recognition research and applications, with multiple avenues for extension in both accuracy and scope (Li et al., 18 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

LSP-YOLO: A Lightweight Single-Stage Network for Sitting Posture Recognition on Embedded Devices (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to LSP-YOLO.