Multi-Task Consistency for Active Learning (2306.12398v1)
Abstract: Learning-based solutions for vision tasks require a large amount of labeled training data to ensure their performance and reliability. In single-task vision-based settings, inconsistency-based active learning has proven to be effective in selecting informative samples for annotation. However, there is a lack of research exploiting the inconsistency between multiple tasks in multi-task networks. To address this gap, we propose a novel multi-task active learning strategy for two coupled vision tasks: object detection and semantic segmentation. Our approach leverages the inconsistency between them to identify informative samples across both tasks. We propose three constraints that specify how the tasks are coupled and introduce a method for determining the pixels belonging to the object detected by a bounding box, to later quantify the constraints as inconsistency scores. To evaluate the effectiveness of our approach, we establish multiple baselines for multi-task active learning and introduce a new metric, mean Detection Segmentation Quality (mDSQ), tailored for the multi-task active learning comparison that addresses the performance of both tasks. We conduct extensive experiments on the nuImages and A9 datasets, demonstrating that our approach outperforms existing state-of-the-art methods by up to 3.4% mDSQ on nuImages. Our approach achieves 95% of the fully-trained performance using only 67% of the available data, corresponding to 20% fewer labels compared to random selection and 5% fewer labels compared to state-of-the-art selection strategy. Our code will be made publicly available after the review process.
- Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong et al., “Swin transformer v2: Scaling up capacity and resolution,” in CVPR, 2022.
- M. Xu, Z. Zhang, H. Hu, J. Wang, L. Wang, F. Wei, X. Bai, and Z. Liu, “End-to-end semi-supervised object detection with soft teacher,” in ICCV, 2021.
- K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object detection,” in ICCV, 2019.
- Y. Yuan, X. Chen, and J. Wang, “Object-contextual representations for semantic segmentation,” in ECCV, 2020.
- B. Cheng, M. D. Collins, Y. Zhu, T. Liu, T. S. Huang, H. Adam, and L.-C. Chen, “Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation,” in CVPR, 2020.
- N. Dvornik, K. Shmelkov, J. Mairal, and C. Schmid, “Blitznet: A real-time deep network for scene understanding,” in ICCV, 2017.
- N. Ebert, P. Mangat, and O. Wasenmuller, “Multitask network for joint object detection, semantic segmentation and human pose estimation in vehicle occupancy monitoring,” in IV, 2022.
- I. Elezi, Z. Yu, A. Anandkumar, L. Leal-Taixe, and J. M. Alvarez, “Not all labels are equal: Rationalizing the labeling costs for training object detection,” in CVPR, 2022.
- P. Colling, L. Roese-Koerner, H. Gottschalk, and M. Rottmann, “Metabox+: A new region based active learning method for semantic segmentation using priority maps,” in ICPRAM, 2020.
- W. H. Beluch, T. Genewein, A. Nürnberger, and J. M. Köhler, “The power of ensembles for active learning in image classification,” in CVPR, 2018.
- D. Yoo and I. S. Kweon, “Learning loss for active learning,” in CVPR, 2019.
- S. Roy, A. Unmesh, and V. P. Namboodiri, “Deep active learning for object detection.” in BMVC, 2018.
- S. V. Desai, A. L. Chandra, W. Guo, S. Ninomiya, and V. N. Balasubramanian, “An adaptive supervision framework for active learning in object detection,” in BMVC, 2019.
- F. Tang, D. Wei, C. Jiang, H. Xu, A. Zhang, W. Zhang, H. Lu, and C. Xu, “Towards dynamic and scalable active learning with neural architecture adaption for object detection,” BMVC, 2021.
- Y. Li, B. Fan, W. Zhang, W. Ding, and J. Yin, “Deep active learning for object detection,” Information Sciences, 2021.
- J. Choi, I. Elezi, H.-J. Lee, C. Farabet, and J. M. Alvarez, “Active learning for deep object detection via probabilistic modeling,” in ICCV, 2021.
- A. Hekimoglu, M. Schmidt, A. Marcos-Ramiro, and G. Rigoll, “Efficient active learning strategies for monocular 3d object detection,” in IV, 2022.
- S. Huang, T. Wang, H. Xiong, J. Huan, and D. Dou, “Semi-supervised active learning with temporal output discrepancy,” in ICCV, 2021.
- S. A. Golestaneh and K. M. Kitani, “Importance of self-consistency in active learning for semantic segmentation,” in BMVC, 2020.
- W. Yu, S. Zhu, T. Yang, and C. Chen, “Consistency-based active learning for object detection,” in CVPR, 2022.
- C.-A. Brust, C. Käding, and J. Denzler, “Active learning for deep object detection,” in VISAPP, 2019.
- H. H. Aghdam, A. Gonzalez-Garcia, J. v. d. Weijer, and A. M. López, “Active learning for deep detection neural networks,” in ICCV, 2019.
- T. Yuan, F. Wan, M. Fu, J. Liu, S. Xu, X. Ji, and Q. Ye, “Multiple instance active learning for object detection,” in CVPR, 2021.
- C.-C. Kao, T.-Y. Lee, P. Sen, and M.-Y. Liu, “Localization-aware active learning for object detection,” in ACCV, 2018.
- Y. Siddiqui, J. Valentin, and M. Nießner, “Viewal: Active learning with viewpoint entropy for semantic segmentation,” in CVPR, 2020.
- B. Li and T. Alstrøm, “On uncertainty estimation in active learning for image segmentation,” in ICMLW, 2020.
- S. Xie, Z. Feng, Y. Chen, S. Sun, C. Ma, and M. Song, “Deal: Difficulty-aware active learning for semantic segmentation,” in ACCV, 2020.
- A. Casanova, P. O. Pinheiro, N. Rostamzadeh, and C. J. Pal, “Reinforced active learning for image segmentation,” in ICLR, 2020.
- T. Kasarla, G. Nagendar, G. M. Hegde, V. Balasubramanian, and C. Jawahar, “Region-based active learning for efficient labeling in semantic segmentation,” in WACV, 2019.
- B. Xie, L. Yuan, S. Li, C. H. Liu, and X. Cheng, “Towards fewer annotations: Active learning via region impurity and prediction uncertainty for domain adaptive semantic segmentation,” in CVPR, 2022.
- O. Sener and S. Savarese, “Active learning for convolutional neural networks: A core-set approach,” in ICLR, 2018.
- S. Agarwal, H. Arora, S. Anand, and C. Arora, “Contextual diversity for active learning,” in ECCV, 2020.
- M. Crawshaw, “Multi-task learning with deep neural networks: A survey,” arXiv preprint arXiv:2009.09796, 2020.
- X. Zhao, H. Li, X. Shen, X. Liang, and Y. Wu, “A modulation module for multi-task learning with applications in image retrieval,” in ECCV, 2018.
- S. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” in CVPR, 2019.
- G. Ghiasi, B. Zoph, E. D. Cubuk, Q. V. Le, and T.-Y. Lin, “Multi-task self-training for learning general representations,” in ICCV, 2021.
- T. Gong, T. Lee, C. Stephenson, V. Renduchintala, S. Padhy, A. Ndirango, G. Keskin, and O. H. Elibol, “A comparison of loss weighting strategies for multi task learning in deep neural networks,” IEEE Access, 2019.
- A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in CVPR, 2018.
- X. Liu, P. He, W. Chen, and J. Gao, “Multi-task deep neural networks for natural language understanding,” in ACL, 2019.
- B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous detection and segmentation,” in ECCV, 2014.
- D. P. Papadopoulos, J. R. Uijlings, F. Keller, and V. Ferrari, “Extreme clicking for efficient object annotation,” in ICCV, 2017.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in NeurIPS, 2015.
- N. O. Salscheider, “Simultaneous object detection and semantic segmentation,” in ICPRAM, 2020.
- R. Reichart, K. Tomanek, U. Hahn, and A. Rappoport, “Multi-task active learning for linguistic annotations,” in ACL, 2008.
- F. Ikhwantri, S. Louvan, K. Kurniawan, B. Abisena, V. Rachman, A. F. Wicaksono, and R. Mahendra, “Multi-task active learning for neural semantic role labeling on low resource conversational corpus,” in ACLW, 2018.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in CVPR, 2020.
- C. Creß, W. Zimmer, L. Strand, M. Fortkord, S. Dai, V. Lakshminarasimhan, and A. Knoll, “A9-dataset: Multi-sensor infrastructure-based dataset for mobility research,” in IV, 2022.
- W. Zimmer, A. Rangesh, and M. Trivedi, “3d bat: A semi-automatic, web-based 3d annotation toolbox for full-surround, multi-modal data streams,” in IV, 2019.
- D. Feng, A. Harakeh, S. L. Waslander, and K. Dietmayer, “A review and comparative study on probabilistic object detection in autonomous driving,” in ITS, 2021.
- M. Everingham, S. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” IJCV, 2015.