YolactEdge: Real-time Instance Segmentation on the Edge (2012.12259v2)

Published 22 Dec 2020 in cs.CV, cs.AI, cs.LG, and cs.RO

Abstract: We propose YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images. To achieve this, we make two improvements to the state-of-the-art image-based real-time method YOLACT: (1) applying TensorRT optimization while carefully trading off speed and accuracy, and (2) a novel feature warping module to exploit temporal redundancy in videos. Experiments on the YouTube VIS and MS COCO datasets demonstrate that YolactEdge produces a 3-5x speed up over existing real-time methods while producing competitive mask and box detection accuracy. We also conduct ablation studies to dissect our design choices and modules. Code and models are available at https://github.com/haotian-liu/yolact_edge.

PDF Abstract

Overview of YolactEdge: Real-time Instance Segmentation on the Edge

The paper introduces YolactEdge, an instance segmentation approach designed to run efficiently on edge devices, achieving real-time performance. Developed as an enhancement of the existing YOLACT framework, YolactEdge incorporates both system-level and algorithmic optimizations, enabling it to process video frames at speeds that are competitive with state-of-the-art methods while operating within the constraints of small hardware such as NVIDIA's Jetson AGX Xavier.

Innovations and Contributions

The key advances in YolactEdge are twofold:

TensorRT Optimization: The authors optimize the Yolact architecture using NVIDIA's TensorRT, which involves mixed-precision quantization of model components to INT8 or FP16. This trade-off balances computational speed with accuracy. The paper provides a comprehensive analysis of these precision settings and demonstrates that selected model components can be quantized to INT8 without substantially degrading performance, resulting in significant speedup.
Exploiting Temporal Redundancy in Video: YolactEdge innovates by introducing a feature warping mechanism that leverages temporal redundancy in video sequences. Instead of recomputing all features for each frame, the method reuses features from previously processed keyframes, transforming only a subset for non-keyframes. This partial feature transformation prioritizes high-resolution features crucial for accurate mask production, facilitating quick computations in the backbone of the network.

Empirical Evaluation

The empirical results, conducted on YouTube VIS and MS COCO datasets, validate the efficiency and competitiveness of YolactEdge. The approach achieves up to 30.8 FPS on the Jetson AGX Xavier and 172.7 FPS on an RTX 2080 Ti. It maintains comparable accuracy levels while outperforming traditional methods in speed.

Tables presented detail that, while YolactEdge incurs minor losses in mask mAP in comparison to traditional YOLACT, particularly due to mixed precision and motion blur issues, it exhibits clear advantages in real-time processing capabilities, necessary for practical applications such as robotics and autonomous systems.

Limitations and Future Directions

The authors acknowledge several limitations, including minor accuracy drops potentially due to mixed-precision inference and motion blur in non-keyframes. Future directions could involve developing techniques for intelligent keyframe selection and training using mixed precision to better address quantization-induced precision discrepancies. Additionally, the capability of the approach to adapt dynamically to different edge devices and constraints remains an intriguing area for further research.

Implications

YolactEdge's ability to segment instances accurately and swiftly on constrained devices like the Jetson AGX Xavier underscores its applicability in real-world scenarios where energy efficiency and portability are crucial. This development broadens the scope of edge computing, highlighting practical implications in areas like autonomous vehicles, surveillance systems, and mobile robotics.

Conclusion

YolactEdge stands out by addressing the challenges of real-time instance segmentation for edge computing with significant advancements in speed and efficiency, while paving the way for further exploration and improvement in the domain of efficient deep learning on resource-constrained hardware.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Haotian Liu (78 papers)
Rafael A. Rivera Soto (1 paper)
Fanyi Xiao (25 papers)
Yong Jae Lee (88 papers)

Citations (54)

View on Semantic Scholar