MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection (2506.03654v2)

Published 4 Jun 2025 in cs.CV and cs.AI

Abstract: Real-time object detection is a fundamental but challenging task in computer vision, particularly when computational resources are limited. Although YOLO-series models have set strong benchmarks by balancing speed and accuracy, the increasing need for richer global context modeling has led to the use of Transformer-based architectures. Nevertheless, Transformers have high computational complexity because of their self-attention mechanism, which limits their practicality for real-time and edge deployments. To overcome these challenges, recent developments in linear state space models, such as Mamba, provide a promising alternative by enabling efficient sequence modeling with linear complexity. Building on this insight, we propose MambaNeXt-YOLO, a novel object detection framework that balances accuracy and efficiency through three key contributions: (1) MambaNeXt Block: a hybrid design that integrates CNNs with Mamba to effectively capture both local features and long-range dependencies; (2) Multi-branch Asymmetric Fusion Pyramid Network (MAFPN): an enhanced feature pyramid architecture that improves multi-scale object detection across various object sizes; and (3) Edge-focused Efficiency: our method achieved 66.6% mAP at 31.9 FPS on the PASCAL VOC dataset without any pre-training and supports deployment on edge devices such as the NVIDIA Jetson Xavier NX and Orin NX.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Xiaochun Lei (4 papers)
Siqi Wu (36 papers)
Weilin Wu (2 papers)
Zetao Jiang (5 papers)

YouTube

Show All Videos

MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection (2506.03654v2)

Related Papers

YouTube