Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains (2312.05832v1)
Abstract: Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting in error localization for fault objection tasks of freight trains. To solve these problems, we design a spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) for visual fault detection of freight trains. We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance embedding to model the global spatial and semantic information. In addition, the proposed dynamic teacher can jointly train with students to further enhance the distillation efficiency. Extensive experiments executed on six typical fault datasets reveal that our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost. The source code will be available at \url{https://github.com/MVME-HBUT/SDD-FTI-FDet}.
- J. H. Sun, Y. X. Xie, and X. Q. Cheng, “A Fast Bolt-Loosening Detection Method of Running Train’s Key Components Based on Binocular Vision,” IEEE Access, vol. 7, pp. 32 227–32 239, 2019.
- L. Liu, F. Zhou, and Y. He, “Automated Visual Inspection System for Bogie Block Key Under Complex Freight Train Environment,” IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 1, pp. 2–14, 2016.
- I. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, and A. Dosovitskiy, “MLP-Mixer: An all-MLP Architecture for Vision,” in Advances in Neural Information Processing Systems, pp. 24 261–24 272, 2021.
- H. Touvron, P. B. M. Caron, M. Cord, A. El-Nouby, E. Grave, G. Izacard, A. Joulin, G. Synnaeve, J. Verbeek, and H. Jégou, “ResMLP: Feedforward networks for image classification with data-efficient training,” arXiv preprint arXiv:2105.03404, 2021.
- C. H. Nguyen, T. C. Nguyen, T. N. Tang, and N. L. H. Phan, “Improving Object Detection by Label Assignment Distillation,” in IEEE Winter Conference on Applications of Computer Vision, pp. 1322–1331, 2022.
- Z. Zheng, R. Ye, P. Wang, D. Ren, W. Zuo, Q. Hou, and M.-M. Cheng, “Localization Distillation for Dense Object Detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9407–9416, Jun. 2022.
- P. Chen, S. Liu, H. Zhao, and J. Jia, “Distilling Knowledge via Knowledge Review,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5006–5015, 2021.
- C. Shu, Y. Liu, J. Gao, Z. Yan, and C. Shen, “Channel-wise knowledge distillation for dense prediction,” in IEEE/CVF International Conference on Computer Vision, pp. 5291–5300, 2021.
- Z. Yang, Z. Li, M. Shao, D. Shi, Z. Yuan, and C. Yuan, “Masked generative distillation,” in European Conference on Computer Vision, pp. 53–69, 2022.
- P. Zhang, Z. Kang, T. Yang, X. Zhang, N. Zheng, and J. Sun, “LGD: Label-Guided Self-Distillation for Object Detection,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 3309–3317, Jun. 2022.
- M. Hao, Y. Liu, X. Zhang, and J. Sun, “LabelEnc: A New Intermediate Supervision Method for Object Detection,” in European Conference on Computer Vision, pp. 529–545, 2020.
- G. Sun, Y. Zhang, H. Tang, H. Zhang, M. Liu, and D. Zhao, “Railway Equipment Detection Using Exact Height Function Shape Descriptor Based on Fast Adaptive Markov Random Field,” Optical Engineering, vol. 57, no. 5, pp. 1 – 14, 2018.
- C. Chen, X. Zou, Z. Zeng, Z. Cheng, L. Zhang, and S. C. H. Hoi, “Exploring Structural Knowledge for Automated Visual Inspection of Moving Trains,” IEEE Transactions on Cybernetics, vol. 52, no. 2, pp. 1–14, 2022.
- L. Zhang, M. Wang, K. Liu, M. Xiao, Z. Wen, and J. Man, “An automatic fault detection method of freight train images based on bd-yolo,” IEEE Access, vol. 10, pp. 39 613–39 626, 2022.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
- J. Pang, K. Chen, Q. Li, Z. Xu, H. Feng, J. Shi, W. Ouyang, and D. Lin, “Towards Balanced Learning for Instance Recognition,” International Journal of Computer Vision, vol. 129, no. 5, pp. 1376–1393, 2021.
- P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, C. Wang, and P. Luo, “Sparse R-CNN: End-to-End Object Detection with Learnable Proposals,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14 449–14 458, 2021.
- X. Zhou, D. Wang, and P. Krähenbühl, “Objects as Points,” arXiv e-prints arXiv:1904.07850, 2019.
- Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully Convolutional One-Stage Object Detection,” in IEEE/CVF International Conference on Computer Vision, pp. 9626–9635, 2019.
- T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal Loss for Dense Object Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, 2020.
- X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,” in Advances in Neural Information Processing Systems, vol. 33, pp. 21 002–21 012, 2020.
- Q. Chen, Y. Wang, T. Yang, X. Zhang, J. Cheng, and J. Sun, “You only look one-level feature,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13 039–13 048, 2021.
- Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO Series in 2021,” arXiv e-prints arXiv:2107.08430, 2021.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” in European Conference on Computer Vision, pp. 213–229, 2020.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable Transformers for End-to-End Object Detection,” in International Conference on Learning Representations, pp. 1–16, 2021.
- D. Lian, Z. Yu, X. Sun, and S. Gao, “AS-MLP: An Axial Shifted MLP Architecture for Vision,” in International Conference on Learning Representations, pp. 1–19, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, pp. 6000–6010, 2017.
- Q. Hou, Z. Jiang, L. Yuan, M.-M. Cheng, S. Yan, and J. Feng, “Vision permutator: A permutable mlp-like architecture for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 1328–1334, 2023.
- H. Zhang, Y. Wang, F. Dayoub, and N. Sunderhauf, “Varifocalnet: An iou-aware dense object detector,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523, 2021.
- Y. Zhang, M. Liu, Y. Yang, Y. Guo, and H. Zhang, “A unified light framework for real-time fault detection of freight train images,” IEEE Transactions on Industrial Informatics, vol. 17, no. 11, pp. 7423–7432, 2021.
- Y. Zhang, Y. Zhou, H. Pan, B. Wu, and G. Sun, “Visual fault detection of multi-scale key components in freight trains,” IEEE Transactions on Industrial Informatics, vol. 19, no. 8, pp. 9082–9090, 2023.
- T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision, pp. 740–755, 2014.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
- A. Howard, M. Sandler, B. Chen, W. Wang, L.-C. Chen, M. Tan, G. Chu, V. Vasudevan, Y. Zhu, R. Pang, H. Adam, and Q. Le, “Searching for MobileNetV3,” in IEEE/CVF International Conference on Computer Vision, pp. 1314–1324, 2019.
- N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in European Conference on Computer Vision, pp. 122–138, 2018.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” in IEEE/CVF International Conference on Computer Vision, pp. 9992–10 002, 2021.