Training-Time-Friendly Network for Real-Time Object Detection (1909.00700v3)

Published 2 Sep 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Modern object detectors can rarely achieve short training time, fast inference speed, and high accuracy at the same time. To strike a balance among them, we propose the Training-Time-Friendly Network (TTFNet). In this work, we start with light-head, single-stage, and anchor-free designs, which enable fast inference speed. Then, we focus on shortening training time. We notice that encoding more training samples from annotated boxes plays a similar role as increasing batch size, which helps enlarge the learning rate and accelerate the training process. To this end, we introduce a novel approach using Gaussian kernels to encode training samples. Besides, we design the initiative sample weights for better information utilization. Experiments on MS COCO show that our TTFNet has great advantages in balancing training time, inference speed, and accuracy. It has reduced training time by more than seven times compared to previous real-time detectors while maintaining state-of-the-art performances. In addition, our super-fast version of TTFNet-18 and TTFNet-53 can outperform SSD300 and YOLOv3 by less than one-tenth of their training time, respectively. The code has been made available at \url{https://github.com/ZJULearning/ttfnet}.

Citations (82)

View on Semantic Scholar

Summary

The paper presents TTFNet, a novel anchor-free, single-stage detector that reduces training time by over sevenfold while achieving competitive accuracy and 112 FPS performance.
It introduces a Gaussian kernel-based encoding approach that mimics larger batch sizes, enabling higher learning rates and quicker convergence.
The method enhances efficiency for resource-limited and time-sensitive applications, demonstrating significant practical implications in real-time object detection.

Training-Time-Friendly Network for Real-Time Object Detection

The presented paper introduces the Training-Time-Friendly Network (TTFNet), an innovative approach to object detection that aims to strike a balance between training time, inference speed, and accuracy. Addressing the prevalent issue where modern object detectors fail to achieve all these targets simultaneously, this research demonstrates that enhancing the training sample encoding process can significantly curtail training duration.

Key Contributions

TTFNet employs a light-head, single-stage, and anchor-free architecture, which are design principles conducive to fast inference speeds. By leveraging Gaussian kernels, the authors introduce a novel encoding methodology that enhances training sample density. This advancement effectively mimics the impact of increased batch sizes, facilitating larger learning rates and expediting the convergence process. Additionally, TTFNet incorporates initiative sample weighting to improve the utilization of available information.

Experimental Results

The authors conducted extensive experiments on the MS COCO dataset, benchmarking TTFNet against established models such as SSD300 and YOLOv3. The results indicate a substantial reduction in training time—over sevenfold—compared to preceding real-time detectors. Notably, the super-fast variant of TTFNet-18 achieves comparable performance to SSD300 in one-tenth of the training time, demonstrating 112 FPS and an AP of 25.9 after merely 1.8 hours. Similarly, TTFNet-53 surpasses YOLOv3 within one-tenth of its training time.

Theoretical and Practical Implications

The research provides a compelling demonstration of the beneficial interplay between sample encoding and batch size analogies in the context of object detection. It elucidates the limitations of existing single-center encoding strategies, exemplified by CenterNet’s protracted convergence, and showcases the efficacy of utilizing Gaussian sample distributions.

Practically, TTFNet's ability to drastically reduce training times without compromising performance has substantial implications for resource-limited computing environments. It also shows promise for training-time-sensitive applications, such as Neural Architecture Search (NAS), where efficiency is paramount.

Future Directions

Future investigations could extend this paper by exploring the integration of TTFNet with other network architectures or experimenting with various Gaussian kernel parameters for further performance enhancements. Moreover, adapting TTFNet for more complex data environments could reveal new insights into handling diverse and rich data sources effectively.

Conclusion

In summary, TTFNet represents a significant step forward in real-time object detection. By rethinking the encoding process and optimizing training efficiency, this work paves the way for future advancements in AI performance balancing. The paper provides both a novel methodological framework and a practical tool for researchers and developers to harness in ongoing AI and computer vision challenges.

Related Papers

GitHub

GitHub - ZJULearning/ttfnet (481 stars)