Papers
Topics
Authors
Recent
2000 character limit reached

YOLOv8s: Compact Real-Time Detector

Updated 1 January 2026
  • YOLOv8s is a compact, real-time object detector with a CSPNet backbone and anchor-free head, balancing speed and accuracy for diverse applications.
  • It integrates advanced modules like FPN, PAN, and adaptive convolutions to ensure robust multi-scale feature extraction and efficient training pipelines.
  • Benchmarking shows strong performance across COCO, UAV, and industrial tasks with low latency and energy-efficient deployment on embedded platforms.

YOLOv8s is a compact, real-time object detector within the Ultralytics YOLOv8 family, designed for balanced speed and accuracy. Built on a CSPNet backbone with advanced feature aggregation, anchor-free prediction, and streamlined training pipelines, YOLOv8s is widely adopted for detection, segmentation, and embedded applications where low-latency inference and reasonable precision are paramount (Yaseen, 2024). Its architectural innovations and readily extensible design have led to broad benchmarking, adaptation, and enhancement across diverse domains.

1. Network Architecture and Computational Principles

YOLOv8s employs a hybrid backbone–neck–head structure integrating several technical advancements:

  • Backbone: Implements Cross-Stage Partial (CSP) bottlenecks, splitting the input feature map into two branches, passing one branch through bottleneck layers and concatenating it back via a 1×11 \times 1 convolution to reduce computation and enhance gradient flow. Specifically, backbone stages use composite C2f modules (cross-stage partial + fused convs) for multi-level feature extraction and efficient feature reuse (Yaseen, 2024, Reis et al., 2023, Gamani et al., 2024).
  • Neck: Constructs a top-down Feature Pyramid Network (FPN) fused with a bottom-up Path Aggregation Network (PAN), merging feature maps at multiple resolutions to increase context for small and large objects (Yaseen, 2024, Reis et al., 2023).
  • Detection Head: Utilizes a decoupled, anchor-free head. Every spatial cell in multi-scale feature maps directly regresses bounding box offsets (distances from center to box sides), objectness score, and multi-label class probability vector via focal loss and CIoU localization loss (Yaseen, 2024, Reis et al., 2023). For segmentation tasks, a parallel mask prediction branch upsamples neck outputs through convolutional layers to produce per-instance masks (Gamani et al., 2024).
Component Structure Notable Modules
Backbone CSPNet, C2f blocks Conv–BN–SiLU
Neck FPN + PAN Top-down, bottom-up
Head Anchor-free, decoupled Objectness, class, box

YOLOv8s maintains approximately 9–11.8 million parameters and 17–42.4 GFLOPs, with input size standardized at 640×640640 \times 640 (Yaseen, 2024, Gamani et al., 2024).

2. Training Regimes and Optimization Strategies

Training pipelines for YOLOv8s leverage:

  • Augmentations: Image resizing, mosaic (combining four images per training sample), cutmix/mixup, color jitter, random flipping/scaling, and normalization (Yaseen, 2024, Gamani et al., 2024, Reis et al., 2023).
  • Loss Functions: The composite loss is:

L=λobjLobj+λclsLcls+λboxLboxL = \lambda_{\mathrm{obj}}L_{\mathrm{obj}} + \lambda_{\mathrm{cls}}L_{\mathrm{cls}} + \lambda_{\mathrm{box}}L_{\mathrm{box}}

where LboxL_{\mathrm{box}} applies CIoU loss, LclsL_{\mathrm{cls}} uses BCE or focal loss, and LobjL_{\mathrm{obj}} is objectness BCE.

A hyperparameter grid over SGD, Adam, RMSProp, and AdamW optimizers (with varying batch sizes/epochs) demonstrated that class-level AP and utility scores vary by optimizer choice (Khan et al., 15 Oct 2025).

3. Quantitative Performance and Benchmarking

YOLOv8s consistently demonstrates robust speed–accuracy tradeoffs:

Task / Benchmark [email protected] FPS Other Notables
COCO 58.5% 166 9M params (Yaseen, 2024)
UAV Orin NX FP32 97.81% 35.7 14.16W power (Rey et al., 6 Feb 2025)
Strawberry Segm. 80.0% 30.3 F1 = 77% (Gamani et al., 2024)
Urban AV 83% 75% faster than YOLO-NAS

4. Architectural Adaptations and Domain-Specific Extensions

YOLOv8s's modularity facilitates task-specific enhancements:

  • Adaptive Shape Convolution Module (ASCM) & Large Kernel Shift Convolution Module (LKSCM): Fab-ASLKS configuration incorporates ASCM in the neck (dynamic kernel adaptation for elongated defects) and LKSCM in the backbone (large receptive fields via channel shifts and small convolutions), achieving a mAP@50 gain of +2.9 pp without parameter inflation (Wang et al., 24 Jan 2025).
  • Bidirectional Feature Pyramid Network (BiFPN) & Small-Object Head: For rice spikelet detection, PANet is replaced with BiFPN (learnable, repeated bidirectional fusions), and a “p2” head is added at a 160 × 160 map to improve small-object recall, yielding [email protected] = 65.9%, F1 = 64.4% (+9.8 pp over baseline), 69 FPS (Chen et al., 28 Jul 2025).
  • Hierarchical Classification Head: hYOLO extends YOLOv8s by duplicating the head per taxonomy level, concatenating inter-level activations, and penalizing child violations in loss, resulting in superior semantic error control and stable multi-level performance (Tsenkova et al., 27 Oct 2025).
Modification Module(s) Affected Performance Gain
ASCM + LKSCM Neck (C2f), Backbone +2.9% mAP@50 (Wang et al., 24 Jan 2025)
BiFPN + P2 Head Neck, Head +3.1% [email protected] (Chen et al., 28 Jul 2025)
hYOLO-hierarchy Head (classification) F1↑, calibrated FP (Tsenkova et al., 27 Oct 2025)

5. Deployment on Embedded and Resource-Constrained Platforms

YOLOv8s is extensively benchmarked on edge devices and in real-time streaming:

  • Jetson Orin Nano/NX: FP32 and INT8 quantization support, balancing mAP and FPS; INT8 achieves up to 41 (Nano) / 56 (NX) FPS with mAP@[0.5:0.95] ≈ 0.80 (Rey et al., 6 Feb 2025). Energy consumption per inference is optimized (down to ~0.22 J).
  • Raspberry Pi 5: Real-time capability not attained for YOLOv8s (7.3 FPS, [email protected] = 0.9621), favored for low-energy consumption scenarios (Rey et al., 6 Feb 2025).
  • Cloud vs Edge: Edge YOLOv8s yields total latency ≈ 35 ms (sub-30 ms RTT possible), whereas cloud incurs >300{>}300 ms due to communication overhead, favoring embedded deployment for latency-sensitive tasks (Rey et al., 6 Feb 2025).

Optimization strategies include post-training quantization, structured pruning, reduced input resolution, and tailored CUDA/NCNN configurations based on hardware, with recommendations for TensorRT layer fusion and NEON acceleration on ARM (Rey et al., 6 Feb 2025).

6. Practical Applications and Comparative Analysis

YOLOv8s has demonstrated broad utility in domains requiring rapid inference and moderate parameter budgets:

  • Autonomous Vehicles: Stable learning curves, class APs exceeding prior state-of-the-art (Far3D, HoP, Li, StreamPETR, SparseBEV) for AV perception tasks (Khan et al., 15 Oct 2025, Khan, 25 Dec 2025).
  • Agricultural Monitoring: Effective for strawberry ripeness segmentation and small-object rice spikelet detection, outperforming larger DNN variants in speed without sizable mAP trade-off (Gamani et al., 2024, Chen et al., 28 Jul 2025).
  • Industrial Inspection: With adaptive backbone/neck modules, high-precision defect detection is possible at real-time speeds and with minimal extra parameters (Wang et al., 24 Jan 2025).
  • Aerial and UAV Detection: Real-time flying object detection in complex scenarios, sustained [email protected]–0.95 ≃ 0.68, >>100 fps on 640×640, competitive with larger models but with reduced resource demands (Reis et al., 2023).

Best performance is obtained with task-specific hyperparameter tuning and, where needed, architectural modifications (e.g., additional heads, attention or kernel modules).

7. Developer Features and Implementation Notes

YOLOv8s can be deployed and trained via:

  • Unified Python Package/CLI: Install via pip install ultralytics; usage via yolo train model=yolov8s.yaml ... and yolo detect model=yolov8s.pt ... (Yaseen, 2024).
  • Configurable YAML: All hyperparameters and architectures are encoded as YAML, enabling batch/grid optimizations and modular extension.
  • ONNX/TensorRT Export: Supports conversion for embedded or cloud-compute targets.

In summary, YOLOv8s is a CSPNet-driven, anchor-free, real-time detector and segmenter, extensible through modular architectural changes, and ubiquitous in edge, vehicle, and industrial perception applications. It achieves robust accuracy-speed trade-offs and supports developer-friendly workflows across the machine vision spectrum (Yaseen, 2024, Reis et al., 2023, Wang et al., 24 Jan 2025, Chen et al., 28 Jul 2025, Khan et al., 15 Oct 2025, Tsenkova et al., 27 Oct 2025, Gamani et al., 2024, Rey et al., 6 Feb 2025, Khan, 25 Dec 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to YOLOv8s Model.