Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection (2407.01894v2)
Abstract: Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method.
- Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, “Deep learning for 3d point clouds: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 12, p. 4338–4364, 2021.
- X. Xie, C. Lang, S. Miao, G. Cheng, K. Li, and J. Han, “Mutual-assistance learning for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 12, p. 15171–15184, 2023.
- G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards large-scale small object detection: Survey and benchmarks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, p. 13467–13488, 2023.
- Z. Jia, X. Xu, J. Hu, and Y. Shi, “Low-power object-detection challenge on unmanned aerial vehicles,” Nature Machine Intelligence, vol. 4, no. 12, pp. 1265–1266, 2022.
- S. Gao, Y. Wang, X. Gao, and B. Hong, “Visual and auditory brain–computer interfaces,” IEEE Transactions on Biomedical Engineering, vol. 61, no. 5, pp. 1436–1447, 2014.
- S. Lees, N. Dayan, H. Cecotti, P. Mccullagh, L. P. Maguire, F. Lotte, and D. H. Coyle, “A review of rapid serial visual presentation-based brain–computer interfaces,” Journal of Neural Engineering, vol. 15, 2018.
- Z. Lan, C. Yan, Z. Li, D. Tang, and X. Xiang, “MACRO: multi-attention convolutional recurrent model for subject-independent ERP detection,” IEEE Signal Processing Letters, vol. 28, pp. 1505–1509, 2021.
- Z. Li, C. Yan, Z. Lan, D. Tang, and X. Xiang, “MCGRAM: Linking multi-scale cnn with a graph-based recurrent attention model for subject-independent ERP detection,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 12, pp. 5199–5203, 2022.
- T. Baltrušaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423–443, 2018.
- D. Liu, W. Dai, H. Zhang, X. Jin, J. Cao, and W. Kong, “Brain-machine coupled learning method for facial emotion recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, p. 10703–10717, 2023.
- W. Wang, D. Tran, and M. Feiszli, “What makes training multi-modal classification networks hard?” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12 695–12 705.
- W. Chango, J. A. Lara, R. Cerezo, and C. Romero, “A review on data fusion in multimodal learning analytics and educational data mining,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 12, 2022.
- Z. Wei, H. Pan, L. Qiao, X. Niu, P. Dong, and D. Li, “Cross-modal knowledge distillation in multi-modal fake news detection,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 4733–4737.
- J. Guo, J. Zhang, S. Li, X. Zhang, and M. Ma, “Mtfd: Multi-teacher fusion distillation for compressed video action recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in NIPS Deep Learning and Representation Learning Workshop, 2015.
- Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4320–4328.
- W. Lin, Y. Li, Y. Ding, and H. Zheng, “Tree-structured auxiliary online knowledge distillation,” pp. 1–8, 2022.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510–4520.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015.
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700–4708.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning, 2019, pp. 6105–6114.
- Y. Duan, Z. Li, X. Tao, Q. Li, S. Hu, and J. Lu, “EEG-based maritime object detection for IoT-driven surveillance systems in smart ocean,” IEEE Internet of Things Journal, vol. 7, no. 10, pp. 9678–9687, 2020.
- X. Zheng and W. Chen, “An attention-based bi-LSTM method for visual object classification via EEG,” Biomedical Signal Processing and Control, vol. 63, p. 102174, 2021.
- C.-C. Tsai and W. Liang, “Event-related components are structurally represented by intrinsic event-related potentials,” Scientific Reports, vol. 11, no. 1, p. 5670, 2021.
- L. Fan, H. Shen, F. Xie, J. Su, Y. Yu, and D. Hu, “Dc-tcnn: A deep model for EEG-based detection of dim targets,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 30, pp. 1727–1736, 2022.
- N. Bigdely-Shamlo, A. Vankov, R. R. Ramirez, and S. Makeig, “Brain activity-based image classification from rapid serial visual presentation,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 16, no. 5, pp. 432–441, 2008.
- S. Zhang, Y. Wang, L. Zhang, and X. Gao, “A benchmark dataset for RSVP-based brain–computer interfaces,” Frontiers in Neuroscience, vol. 14, p. 568000, 2020.
- C. Barngrover, A. Althoff, P. DeGuzman, and R. Kastner, “A brain–computer interface (BCI) for the detection of mine-like objects in sidescan sonar imagery,” IEEE Journal of Oceanic Engineering, vol. 41, no. 1, pp. 123–138, 2015.
- L. Huang, Y. Zhao, Y. Zeng, and Z. Lin, “BHCR: RSVP target retrieval BCI framework coupling with CNN by a Bayesian method,” Neurocomputing, vol. 238, pp. 255–268, 2017.
- R. Manor, L. Mishali, and A. B. Geva, “Multimodal neural network for rapid serial visual presentation brain computer interface,” Frontiers in Computational Neuroscience, vol. 10, p. 130, 2016.
- C. Du, K. Fu, J. Li, and H. He, “Decoding visual neural representations by multimodal learning of brain-visual-linguistic features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10 760–10 777, 2023.
- J. Zhu, C. Yang, X. Xie, S. Wei, Y. Li, X. Li, and B. Hu, “Mutual information based fusion model (MIBFM): mild depression recognition using EEG and pupil area signals,” IEEE Transactions on Affective Computing, 2022.
- W. Zhou, S. Dong, J. Lei, and L. Yu, “MTANet: Multitask-aware network with hierarchical multimodal fusion for RGB-T urban scene understanding,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 48–58, 2022.
- P. Xu, X. Zhu, and D. A. Clifton, “Multimodal learning with transformers: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 113–12 132, 2023.
- Y. Li, Y. Wang, and Z. Cui, “Decoupled multimodal distilling for emotion recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 6631–6640.
- S. Ren, Y. Du, J. Lv, G. Han, and S. He, “Learning from the master: Distilling cross-modal advanced knowledge for lip reading,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13 325–13 333.
- H. Zhou, W. Zhou, W. Qi, J. Pu, and H. Li, “Improving sign language translation with monolingual data by sign back-translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1316–1325.
- N. Ding, S.-w. Tian, and L. Yu, “A multimodal fusion method for sarcasm detection based on late fusion,” Multimedia Tools and Applications, vol. 81, no. 6, pp. 8597–8616, 2022.
- A. Nagrani, S. Yang, A. Arnab, A. Jansen, C. Schmid, and C. Sun, “Attention bottlenecks for multimodal fusion,” Advances in Neural Information Processing Systems, vol. 34, pp. 14 200–14 213, 2021.
- R. Yang, S. Wang, Y. Sun, H. Zhang, Y. Liao, Y. Gu, B. Hou, and L. Jiao, “Multimodal fusion remote sensing image–audio retrieval,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 6220–6235, 2022.
- Y. Liu, K. Chen, C. Liu, Z. Qin, Z. Luo, and J. Wang, “Structured knowledge distillation for semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2604–2613.
- X. Chen, Q. Cao, Y. Zhong, J. Zhang, S. Gao, and D. Tao, “Dearkd: data-efficient early knowledge distillation for vision transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 12 052–12 062.
- J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, 2019, p. 2.
- M. Ryu, G. Lee, and K. Lee, “Knowledge distillation for bert unsupervised domain adaptation,” Knowledge and Information Systems, vol. 64, no. 11, pp. 3113–3128, 2022.
- J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision, vol. 129, pp. 1789–1819, 2021.
- S. Zhang, C. Tang, and C. Guan, “Visual-to-EEG cross-modal knowledge distillation for continuous emotion recognition,” Pattern Recognition, vol. 130, p. 108833, 2022.
- S. You, C. Xu, C. Xu, and D. Tao, “Learning from multiple teacher networks,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 1285–1294.
- H. Zhang, D. Chen, and C. Wang, “Confidence-aware multi-teacher knowledge distillation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 4498–4502.
- K. Kwon, H. Na, H. Lee, and N. S. Kim, “Adaptive knowledge distillation based on entropy,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7409–7413.
- Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo, “Online knowledge distillation via collaborative learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 020–11 029.
- C. Li, G. Li, H. Zhang, and D. Ji, “Embedded mutual learning: A novel online distillation method integrating diverse knowledge sources,” Applied Intelligence, vol. 53, no. 10, pp. 11 524–11 537, 2023.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
- G. E. Chatrian, E. Lettich, and P. L. Nelson, “Ten percent electrode system for topographic studies of spontaneous and evoked EEG activities,” American Journal of EEG Technology, vol. 25, no. 2, pp. 83–92, 1985.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132–7141.
- V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces,” Journal of Neural Engineering, vol. 15, no. 5, p. 056013, 2018.
- Y. Ding, N. Robinson, S. Zhang, Q. Zeng, and C. Guan, “Tsception: Capturing temporal dynamics and spatial asymmetry from EEG for emotion recognition,” IEEE Transactions on Affective Computing, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, p. 6000–6010.
- X. Peng, Y. Wei, A. Deng, D. Wang, and D. Hu, “Balanced multimodal learning via on-the-fly gradient modulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 8238–8247.
- C. Yang, Z. An, H. Zhou, F. Zhuang, Y. Xu, and Q. Zhang, “Online knowledge distillation via mutual contrastive learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 10 212–10 227, 2023.
- J.-H. Kim, K.-W. On, W. Lim, J. Kim, J.-W. Ha, and B.-T. Zhang, “Hadamard product for low-rank bilinear pooling,” in International Conference on Learning Representations (ICLR), 2017.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” University of Toronto, 05 2012.
- J. Rao, X. Meng, L. Ding, S. Qi, X. Liu, M. Zhang, and D. Tao, “Parameter-efficient and student-friendly knowledge distillation,” IEEE Transactions on Multimedia, 2023.
- N. Passalis and A. Tefas, “Learning deep representations with probabilistic knowledge transfer,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 268–284.
- Y. Tian, D. Krishnan, and P. Isola, “Contrastive representation distillation,” in arXiv preprint arXiv:1910.10699, 2019.
- W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3967–3976.
- H. Liu, J. Yang, M. Ye, S. C. James, Z. Tang, J. Dong, and T. Xing, “Using t-distributed stochastic neighbor embedding (t-SNE) for cluster analysis and spatial zone delineation of groundwater geochemistry data,” Journal of Hydrology, vol. 597, p. 126146, 2021.