YOLO-NAS: NAS-Optimized Object Detector

Updated 10 February 2026

YOLO-NAS is a family of real-time object detectors automatically designed via hardware-in-the-loop NAS for optimal mAP-latency tradeoffs.
It incorporates quantization-aware blocks such as QSP and QCI modules to maintain accuracy under INT8 post-training quantization.
Benchmark results show variants like YOLO-NAS Small achieve competitive precision, high recall, and sub-millisecond inference for real-time use.

YOLO-NAS is a family of real-time single-stage object detectors whose network architectures are determined automatically via neural architecture search (NAS), specifically using Deci’s proprietary AutoNAC platform. It introduces quantization-aware building blocks and specialized layers designed to maintain accuracy under post-training INT8 quantization, with hardware-in-the-loop search optimizing the mAP versus latency tradeoff for deployment on diverse edge and datacenter inference platforms (Terven et al., 2023, BN et al., 2024).

1. Neural Architecture Search Methodology and Design Principles

YOLO-NAS is the first YOLO variant devised entirely via black-box, hardware-in-the-loop NAS. The AutoNAC engine defines a search space structured around three key layer types:

RepVGG-style re-parameterizable blocks: These allow multi-branch convolutional paths during training and re-parameterize to a single branch for efficient inference.
QSP (Quantization-aware Spatial Processing) blocks: Inserted at selected backbone/neck points, these minimize distributional errors introduced during INT8 quantization.
QCI (Quantization-aware Channel Interaction) modules: These interleave with backbone stages to ensure channel-wise feature integrity under low-precision representation.

The search considers, for each backbone/neck/head stage, how many RepVGG blocks to stack, whether to insert QSP/QCI modules, and how to configure the detection heads’ feature-map sizes and output resolutions (i.e., “S/M/L” model variants). Each candidate architecture is compiled and benchmarked for throughput and accuracy, with Pareto-optimal models emerging via evolutionary search.

The objective function is multi-factorial, balancing MS-COCO AP@[.50:.95] against measured (not surrogate) inference latency or throughput on the target device. Both INT8 PTQ and mixed-precision (FP16) are included in the evaluation loop to ensure quantization robustness (Terven et al., 2023, BN et al., 2024).

2. YOLO-NAS Model Variants and Architectural Details

Three canonical configurations are offered: YOLO-NAS-S (Small), YOLO-NAS-M (Medium), and YOLO-NAS-L (Large). The principal architectural features are as follows:

Backbone:
- Initial STEM: 3×3 conv + batch norm + SiLU (or, in Small: 1×1 CBR with ReLU).
- Four RepVGG stages, each block combining a 3×3 and 1×1 conv (merged at inference).
- QSP inserted after select stages to mitigate quantization error.
- QCI before each stage output for channel-wise compatibility.
Neck:
- PANet-style path aggregation with lateral RepVGG blocks for multi-scale fusion.
- Lateral QSP modules at highest feature-map resolution.
Head:
- Three detection heads, typically at strides 8, 16, and 32 (stage S/M/L).
- Each head: 1×1 conv, two parallel 3×3 conv paths for objectness/classification versus bounding-box regression, and a final 1×1 conv to output $\mathrm{[C+1+4]}$ channels per anchor-free location.

YOLO-NAS Small reduces parameter count (~19M) and employs lightweight CBR and QA-RepVGG layers with per-stage QSP/QCI modules for edge efficiency (BN et al., 2024).

3. Loss Functions and Training Paradigms

For the general family, loss follows a three-term structure as in YOLOv5/YOLOv8:

$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{obj}} + \mathcal{L}_{\text{cls}} + \mathcal{L}_{\text{box}}$

$\mathcal{L}_{\text{obj}}$ (objectness): Binary cross-entropy on presence/absence.
$\mathcal{L}_{\text{cls}}$ (classification): Binary cross-entropy or focal loss over classes.
$\mathcal{L}_{\text{box}}$ (localization): CIoU or DFL, using $1 - \mathrm{IoU}$ as loss.
Per-component weights applied per head.

The YOLO-NAS Small variant implements the PPYoloELoss composite:

Cross-entropy classification loss.
Direct IoU loss ( $1-\mathrm{IoU}$ ).
Distribution Focal Loss (DFL) for bounding box refinement.

$\mathcal{L} = \mathcal{L}_{\text{cls}} + \lambda_{\text{iou}}\,\mathcal{L}_{\text{iou}} + \lambda_{\text{dfl}}\,\mathcal{L}_{\text{dfl}}$

where $\lambda_{\text{iou}} = \lambda_{\text{dfl}} = 1$ by default (BN et al., 2024, Terven et al., 2023).

4. Training Regimes, Optimization, and Data Augmentation

Standard YOLOv5/YOLOv8 data augmentations (e.g., image flipping, scaling, cropping, color transforms) are adopted. Unique to YOLO-NAS:

Pre-training: Both backbone and head undergo initialization on Objects365 (2M images, 365 classes).
Pseudo-labeling: MS-COCO training images receive additional pseudo-labels to warm-start detection heads.
Self-distillation: The initially trained model teaches itself in a secondary fine-tuning phase, improving localization coherence (Terven et al., 2023).
Quantization-aware recipe: Candidate architectures are tuned for selective INT8 quantization (e.g., feature extractors in INT8, critical neck/head layers in FP16), with QSP/QCI block placement determined by NAS.

For YOLO-NAS Small with Super Gradients (BN et al., 2024), optimization uses Adam (with weight decay 0.01), a cosine-annealing learning rate schedule, mixed-precision training (FP16), exponential moving average (decay 0.9), and is run for 10 epochs with model selection via [email protected].

5. Benchmarking and Performance Analysis

Quantitative results on MS-COCO and dedicated small-object datasets highlight the mAP–latency tradeoffs:

Model	Precision	mAP@[.50:.95]	FPS (A100, INT8/FP16)	Params (M)	FLOPs (B)
YOLOv8x	FP16	53.9%	280	87	205
YOLO-NAS-L	INT8	52.2%	300	80	190

YOLO-NAS-L, INT8 quantized, is 7% faster than YOLOv8x at the cost of a 1.7 percentage point mAP drop, with 8–10% fewer parameters and FLOPs (Terven et al., 2023).
YOLO-NAS Small, trained on Roboflow YCB-COCO small-object data, achieves [email protected] = 0.96, recall = 0.98, precision = 0.64, and 8 ms inference latency per 512×512 image on consumer GPUs (BN et al., 2024).
When compared to YOLOv5s, YOLOv7-tiny, and YOLOv8n small-model variants, YOLO-NAS Small achieves the highest recall and competitive precision on small-object detection tasks.

6. Applications, Strengths, and Limitations

Applications:

High-throughput edge robotics (drones, mobile robots) demanding sub-millisecond INT8 inference.
Automotive advanced driver assistance systems (ADAS) and embedded surveillance on INT8-capable SoCs.
Industrial inspection, retail analytics, and smart cameras requiring maximum FPS under tight mAP constraints.
Specialized assistive systems, such as real-time indoor navigation aids for the blind, where YOLO-NAS Small's high recall and low-latency vision-to-audio pipeline is essential (BN et al., 2024).

Strengths:

Hardware-adaptiveness: NAS with hardware-in-the-loop yields architectures that optimize the latency–accuracy Pareto frontier on the user’s hardware.
Quantization-robust: QSP/QCI modules ensure minimal mAP degradation after INT8 quantization.
Strong F1 for recall: YOLO-NAS Small, in particular, is designed for high-recall scenarios where missed detections are intolerable.
Efficient for small objects: Retains competitive accuracy on challenging, small-object–heavy datasets.

Limitations:

Proprietary NAS and blocks: The AutoNAC search algorithm details and exact search-space encoding are not open source (Terven et al., 2023).
Slight accuracy gap: YOLO-NAS-L is ≈1.7 percentage points lower in mAP than YOLOv8x FP16.
Added complexity in training: Large-scale pre-training, pseudo-labeling, and self-distillation prolong the training schedule.

7. Prospects and Future Research Directions

Prospective directions for advancing YOLO-NAS include:

Extending NAS discovery to include activation functions, as in ActNAS (Sah et al., 2024), or optimizing skip connections, quantizers, and micro-kernels jointly.
Incorporating continuous relaxation methods (e.g., DARTS-style search spaces) for joint multi-device efficiency.
Enhancing quantization strategies using per-channel INT8 or hybrid-precision learning for further latency reductions and mAP preservation.
Exploring integration of object tracking and end-to-end audio/haptic feedback systems for assistive technologies (BN et al., 2024).

YOLO-NAS demonstrates the efficacy of NAS-driven, quantization-aware detector design for real-time applications, shifting the model development process toward hardware-coupled, automated architectural optimization (Terven et al., 2023, BN et al., 2024).

Markdown Upgrade to Chat

References (3)

A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS (2023)

Small Object Detection for Indoor Assistance to the Blind using YOLO NAS Small and Super Gradients (2024)

ActNAS : Generating Efficient YOLO Models using Activation NAS (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to YOLO-NAS.

YOLO-NAS: NAS-Optimized Object Detector

1. Neural Architecture Search Methodology and Design Principles

2. YOLO-NAS Model Variants and Architectural Details

3. Loss Functions and Training Paradigms

4. Training Regimes, Optimization, and Data Augmentation

5. Benchmarking and Performance Analysis

6. Applications, Strengths, and Limitations

7. Prospects and Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

YOLO-NAS: NAS-Optimized Object Detector

1. Neural Architecture Search Methodology and Design Principles

2. YOLO-NAS Model Variants and Architectural Details

3. Loss Functions and Training Paradigms

4. Training Regimes, Optimization, and Data Augmentation

5. Benchmarking and Performance Analysis

6. Applications, Strengths, and Limitations

7. Prospects and Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research