Papers
Topics
Authors
Recent
2000 character limit reached

YOLOv8-s: Lightweight Real-Time Detector

Updated 25 December 2025
  • YOLOv8-s is a lightweight, anchor-free object detector optimized for high throughput and real-time detection.
  • It employs a modified CSPDarknet backbone with C2f modules and a PAN-style neck to achieve a balance between computational efficiency and accuracy.
  • Widely applied in ITS, barcode recognition, UAV inspection, and industrial monitoring, it provides practical deployment benefits on edge devices.

YOLOv8 Small Variant

The YOLOv8 Small variant (YOLOv8-s or YOLOv8s) is a lightweight, single-stage, anchor-free object detector engineered for high throughput and solid accuracy in real-time detection scenarios, notably in edge deployment contexts. As a canonical trade-off model within the YOLOv8 family, it integrates Ultralytics’ C2f modules in a streamlined CSPDarknet backbone, a PAN-style neck, and decoupled prediction heads, achieving competitive results across benchmarks while maintaining a modest parameter and computational footprint. This variant is extensively applied in domains requiring fast inference, such as intelligent transportation systems, barcode/QR code recognition, UAV-based small-object detection, and civil infrastructure inspection (Amin et al., 18 Dec 2025, Pandya et al., 28 Nov 2025, Taffese et al., 12 Jan 2025, Khalili et al., 8 Aug 2024, Chen et al., 28 Jul 2025, Chen et al., 26 Sep 2025, Yaseen, 28 Aug 2024, Hussain, 3 Jul 2024, Zhang, 6 Mar 2025).

1. Architectural Characteristics and Core Modules

YOLOv8-s follows a modular convolutional design built for parameter efficiency and multi-scale feature extraction.

  • Backbone: The network backbone is based on modified CSPDarknet, with C2f (“Cross Stage Partial with fused layers”) modules providing channel-wise and spatial gradient flow for more efficient representation. The backbone stages typically downsample through 3×3 convolutions, yielding feature maps at progressively coarser spatial scales (Hussain, 3 Jul 2024, Amin et al., 18 Dec 2025).
  • SPPF Layer: A Spatial Pyramid Pooling—Fast layer sits at the bridge between the backbone and the neck, increasing the receptive field while exhibiting little computational overhead (Amin et al., 18 Dec 2025, Pandya et al., 28 Nov 2025).
  • Neck: Features from different backbone stages are fused via a PAN-style (Path Aggregation Network) or FPN/PAN hybrid neck, which, in the standard design, generates three main output scales (stride 8, 16, 32). Some derivative works replace PAN with BiFPN or Hierarchical feature fusion for enhanced multi-scale performance, particularly for small-object scenarios (Chen et al., 28 Jul 2025, Chen et al., 26 Sep 2025).
  • Detection Head: The prediction head is anchor-free and operates with three scales. Each head is split/decoupled for independent regression and classification branches. Typical output shapes are (80×80×(C+5)), (40×40×(C+5)), and (20×20×(C+5)), where C is the number of classes (Amin et al., 18 Dec 2025, Hussain, 3 Jul 2024).
  • Activation & Normalization: SiLU activation (also known as Swish) is used throughout convolutional layers for smoother gradients, coupled with BatchNorm (Hussain, 3 Jul 2024).
  • Parameter and FLOP Profile: The model comprises 11.1–11.2 million parameters and 28.6 GFLOPs per 640×640 image (Amin et al., 18 Dec 2025, Hussain, 3 Jul 2024, Taffese et al., 12 Jan 2025).

2. Model Complexity and Scaling Trade-offs

YOLOv8-s stands as an intermediate point in the YOLOv8 family, optimized for cases where nano models underperform and medium/large models are resource-prohibitive.

Variant Params (M) FLOPs (G) [email protected] Precision Recall File Size (MB)
YOLOv8-nano 3.0–3.2 8.7 0.811–0.918 0.964 (LPR) 0.876– ~12
YOLOv8-small 11.1–11.2 28.6 0.846–0.933 0.945 (LPR) 0.874 (LPR) ~45
YOLOv8-medium 25.9 78.9 0.85–0.94 0.946 (LPR) 0.912 (LPR) ~90

Values compiled from (Amin et al., 18 Dec 2025, Taffese et al., 12 Jan 2025, Hussain, 3 Jul 2024). [email protected] and precision/recall are dataset-dependent.

YOLOv8-s achieves a significant increase in accuracy and recall over nano models for a ~3× computational and parameter cost, while medium models provide marginal accuracy gains at more than double the cost again. On tasks such as license plate recognition and crack detection, YOLOv8-s consistently achieves a favorable accuracy-speed-complexity balance (Amin et al., 18 Dec 2025, Taffese et al., 12 Jan 2025).

3. Training Configuration and Loss Functions

  • Optimizers: Training may use AdamW or SGD. Default schedules include:
  • Data Augmentation: Typical augmentations include mosaic, mixup, color jitter (HSV), random flips, rotation, blur, cropping, and online geometric transforms, though specifics are dataset-dependent (Pandya et al., 28 Nov 2025, Taffese et al., 12 Jan 2025, Hussain, 3 Jul 2024).
  • Loss Terms: The objective is the sum of objectness, classification, and box regression losses:
    • Ltotal=Lobj+Lcls+LboxL_\text{total} = L_\text{obj} + L_\text{cls} + L_\text{box}
    • Classification: Cross-entropy.
    • Box regression: MSE or CIoU/PIoU, depending on the variant. Decoupled heads may also employ dynamic label assignment and, in some works, CIoU is replaced with PIoU to mitigate anchor-box enlargement artifacts (Amin et al., 18 Dec 2025, Khalili et al., 8 Aug 2024).

4. Detection Performance and Benchmarks

YOLOv8-s delivers real-time or near-real-time inference across multiple tasks and datasets, providing competitive accuracy.

Performance is robust relative to model size, and the mAP gap between small and larger YOLOv8 models is modest (often ≤5 pp) given the reduction in parameters and latency.

5. Application-Specific Variants and Small-Object Extensions

Multiple works derive from YOLOv8-s to target small-object detection and efficiency for edge scenarios.

  • SOD-YOLOv8: Adds a fourth high-resolution detection head (stride 2) and replaces PANet with GFPN. Incorporates C2f-EMA attention and PIoU loss; measured recall and precision boosts from 40.1%→43.9% and 51.2%→53.9%, mAP0.5 from 40.6%→45.1% (Khalili et al., 8 Aug 2024).
  • YOLOv8s-p2: Integrates BiFPN with learnable weights and incorporates a stride-4 detection head, raising recall for tiny rice spikelets and improving [email protected] by 3.1% over baseline (Chen et al., 28 Jul 2025).
  • HierLight-YOLO-S: Substitutes C2f with IRDCB, standard downsampling with LDown, and replaces the PANet neck with a hierarchical feature fusion (HEPAN). Adds a P2 (160×160) detection head, reducing parameter count by ~30% and increasing small-object AP by +3.3 points (Chen et al., 26 Sep 2025).
  • FDM-YOLO: Removes the largest detection head and adds a high-resolution P2 head, introduces Fast-C2f modules (PConv-based), dynamic upsampling (Dysample), and lightweight EMA attention, reducing parameter count by 38% and improving [email protected] from 38.4% to 42.5% on VisDrone (Zhang, 6 Mar 2025).

These modifications typically target high recall and AP for objects <32 px, crucial in UAV, traffic, and field monitoring.

6. Edge Deployment and Practical Implications

Due to its parameter counts (11–11.2M), compute requirements (28.6 GFLOPs), and architecture, YOLOv8-s is well-suited for contemporary high-end and mid-tier edge devices.

  • Latency and Throughput: YOLOv8-s achieves sub-millisecond to millisecond per-image inference on modern GPUs (e.g., A100, RTX 3090), and 15–60 FPS on Jetson Xavier/Orin, based on task and optimizations (Amin et al., 18 Dec 2025, Hussain, 3 Jul 2024, Taffese et al., 12 Jan 2025).
  • Resource Profile: File size ≈45 MB (FP32); quantization or pruning can reduce this further.
  • Suitability: Recommended for deployment scenarios balancing moderate-to-high accuracy with strict latency and resource constraints—including ITS (Intelligent Transportation Systems), mobile device vision, video analytics, real-time industrial inspection, and on-device inference pipelines.

A notable use case involves a pipeline where YOLOv8-nano detects candidate regions (e.g., license plates), passing the region to YOLOv8-s for fine-grained tasks such as character or small-object localization, leveraging the strengths of both model sizes (Amin et al., 18 Dec 2025).

7. Limitations and Comparative Positioning

YOLOv8-s, while effective, does exhibit certain limitations:

  • Extremely Small Objects/Occlusions: Although competitive, baseline YOLOv8-s sometimes underperforms with extremely small or heavily occluded objects unless explicitly modified via high-resolution heads or advanced multi-scale fusion (Pandya et al., 28 Nov 2025, Khalili et al., 8 Aug 2024, Chen et al., 28 Jul 2025).
  • Inference Speed Reporting: Some evaluation papers omit direct FPS throughput; inferences are generally extrapolated from FLOP counts or hardware reports.
  • Further Compression: For ultra-constrained TinyML microcontrollers, YOLOv8-nano or special pruned/quantized variants are favored, though YOLOv8-s remains more robust for complex multi-class scenarios (Elshamy et al., 21 Oct 2024).
  • Absolute Accuracy Ceilings: Larger YOLOv8 variants (medium, large) can yield slightly higher mAP at the cost of increased latency and size, suggesting use-case dependent model selection (Taffese et al., 12 Jan 2025, Hussain, 3 Jul 2024).

References

Whiteboard

Follow Topic

Get notified by email when new papers are published related to YOLOv8 Small Variant.