Active Learning for Video Classification with Frame Level Queries (2307.05587v1)
Abstract: Deep learning algorithms have pushed the boundaries of computer vision research and have depicted commendable performance in a variety of applications. However, training a robust deep neural network necessitates a large amount of labeled training data, acquiring which involves significant time and human effort. This problem is even more serious for an application like video classification, where a human annotator has to watch an entire video end-to-end to furnish a label. Active learning algorithms automatically identify the most informative samples from large amounts of unlabeled data; this tremendously reduces the human annotation effort in inducing a machine learning model, as only the few samples that are identified by the algorithm, need to be labeled manually. In this paper, we propose a novel active learning framework for video classification, with the goal of further reducing the labeling onus on the human annotators. Our framework identifies a batch of exemplar videos, together with a set of informative frames for each video; the human annotator needs to merely review the frames and provide a label for each video. This involves much less manual work than watching the complete video to come up with a label. We formulate a criterion based on uncertainty and diversity to identify the informative videos and exploit representative sampling techniques to extract a set of exemplar frames from each video. To the best of our knowledge, this is the first research effort to develop an active learning framework for video classification, where the annotators need to inspect only a few frames to produce a label, rather than watching the end-to-end video.
- V. Sharma, M. Gupta, A. Kumar, and D. Mishra, “Video processing using deep learning techniques: A systematic literature review,” IEEE Access, vol. 9, 2021.
- J. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
- H. Tian, Y. Tao, S. Pouyanfar, S. Chen, and M. Shyu, “Multimodal deep representation learning for video classification,” World Wide Web, vol. 22, no. 3, pp. 1325 – 1341, 2019.
- B. Settles, “Active learning literature survey,” in Technical Report: University of Wisconsin-Madison, 2010.
- D. Yoo and I. Kweon, “Learning loss for active learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- S. Tong and D. Koller, “Support vector machine active learning with applications to text classification,” Journal of Machine Learning Research (JMLR), vol. 2, pp. 45–66, 2001.
- H. Osmanbeyoglu, J. Wehner, J. Carbonell, and M. Ganapathiraju, “Active machine learning for transmembrane helix prediction,” BMC Bioinformatics, vol. 11, no. 1, 2010.
- M. Gorriz, A. Carlier, E. Faure, and X. G. i Nieto, “Cost-effective active learning for melanoma segmentation,” in Neural Information processing Systems (NeurIPS) Workshop, 2017.
- P. Ren, Y. Xiao, X. Chang, P. Huang, Z. Li, B. Gupta, X. Chen, and X. Wang, “A survey of deep active learning,” ACM Computing Surveys, vol. 54, no. 9, 2021.
- A. Holub, P. Perona, and M. Burl, “Entropy-based active learning for object recognition,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR-W), 2008.
- Y. Freund, S. Seung, E. Shamir, and N. Tishby, “Selective sampling using the query by committee algorithm,” Machine Learning, vol. 28, no. 2-3, pp. 133–168, 1997.
- W. Fu, M. Wang, S. Hao, and X. Wu, “Scalable active learning by approximated error reduction,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2018.
- K. Wei, R. Iyer, and J. Bilmes, “Submodularity in data subset selection and active learning,” in International Conference on Machine Learning (ICML), 2015.
- K. Fujii and H. Kashima, “Budgeted stream-based active learning via adaptive submodular maximization,” in Neural Information Processing Systems (NeurIPS), 2016.
- O. Sener and S. Savarese, “Active learning for convolutional neural networks: A core-set approach,” in International Conference on Learning Representations (ICLR), 2018.
- Y. Geifman and R. El-Yaniv, “Deep active learning with a neural architecture search,” in Neural Information Processing Systems (NeurIPS), 2019.
- C. Shui, F. Zhou, C. Gagne, and B. Wang, “Deep active learning: Unified and principled method for query and training,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
- M. Ducoffe and F. Precioso, “Adversarial active learning for deep networks: a margin based approach,” in International Conference on Machine Learning (ICML), 2018.
- C. Mayer and R. Timofte, “Adversarial sampling for active learning,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.
- B. Zhang, L. Li, S. Yang, S. Wang, Z. Zha, and Q. Huang, “State-relabeling adversarial active learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- S. Sinha, S. Ebrahimi, and T. Darrell, “Variational adversarial active learning,” in IEEE International Conference on Computer Vision (ICCV), 2019.
- R. Chattopadhyay, W. Fan, I. Davidson, S. Panchanathan, and J. Ye, “Joint transfer and batch-mode active learning,” in International Conference on Machine Learning (ICML), 2013.
- D. Krueger, J. Leike, O. Evans, and J. Salvatier, “Active reinforcement learning: Observing rewards at a cost,” in Neural Information Processing Systems (NeurIPS) Workshop, 2016.
- N. Ruchansky, M. Crovella, and E. Terzi, “Matrix completion with queries,” in ACM Conference on Knowledge Discovery and Data Mining (KDD), 2015.
- A. Molino, X. Boix, J. Lim, and A. Tan, “Active video summarization: Customized summaries via on-line interaction with the user,” in Association for the Advancement of Artificial Intelligence (AAAI), 2017.
- H. Shim, S. Hwang, and E. Yang, “Joint active feature acquisition and classification with variable-size set encoding,” in Neural Information Processing Systems (NeurIPS), 2018.
- A. Joshi, F. Porikli, and N. Papanikolopoulos, “Breaking the interactive bottleneck in multi-class classification with active selection and binary feedback,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
- A. Biswas and D. Jacobs, “Active image clustering: Seeking constraints from humans to complement algorithms,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
- S. Xiong, Y. Pei, R. Rosales, and X. Fern, “Active learning from relative comparisons,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 12, 2015.
- B. Qian, X. Wang, F. Wang, H. Li, J. Ye, and I. Davidson, “Active learning from relative queries,” in International Joint Conference on Artificial Intelligence (IJCAI), 2013.
- A. Bhattacharya and S. Chakraborty, “Active learning with n-ary queries for image recognition,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2019.
- A. Joshi, F. Porikli, and N. Papanikolopoulos, “Scalable active learning for multiclass image classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 34, no. 11, pp. 2259 – 2273, 2012.
- T. Sabata, P. Pulc, and M. Holena, “Semi-supervised and active learning in video scene classification from statistical features,” in Workshop at the European Conference on Machine Learning (ECML), 2018.
- S. Sivaraman and M. Trivedi, “A general active-learning framework for on-road vehicle recognition and tracking,” IEEE Transactions on Intelligent Transportation Systems (TITS), vol. 11, no. 2, pp. 267 – 276, 2010.
- R. Yan, J. Yang, and A. Hauptmann, “Automatically labeling video data using multi-class active learning,” in IEEE International Conference on Computer Vision (ICCV), 2003.
- S. Vijayanarasimhan, P. Jain, and K. Grauman, “Far-sighted active learning on a budget for image and video recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
- L. Zhao, G. Sukthankar, and R. Sukthankar, “Robust active learning using crowdsourced annotations for activity recognition,” in Workshop at the AAAI Conference on Artificial Intelligence, 2011.
- S. Bandla and K. Grauman, “Active learning of an action detector from untrimmed videos,” in IEEE International Conference on Computer Vision (ICCV), 2013.
- S. Ma, Z. Zeng, D. McDuff, and Y. Song, “Active contrastive learning of audio-visual video representations,” in International Conference on Learning Representations (ICLR), 2021.
- S. Behpour, “Active learning in video tracking,” in arXiv:1912.12557, 2020.
- D. Chan, S. Vijayanarasimhan, D. Ross, and J. Canny, “Active learning for video description with cluster-regularized ensemble ranking,” in Asian Conference on Computer Vision (ACCV), 2020.
- J. Cai, J. Tang, Q. Chen, Y. Hu, X. Wang, and S. Huang, “Multi-view active learning for video recommendation,” in International Joint Conference on Artificial Intelligence (IJCAI), 2019.
- A. Fathi, M. Balcan, X. Ren, and J. Rehg, “Combining self training and active learning for video segmentation,” in British Machine Vision Conference (BMVC), 2011.
- D. Shen, J. Zhang, J. Su, G. Zhou, and C. Tan, “Multi-criteria based active learning for named entity recognition,” in Association for Computational Linguistics (ACL), 2004.
- B. Sriperumbudur, K. Fukumizu, and G. Lanckriet, “Universality, characteristic kernels and rkhs embedding of measures,” Journal of Machine Learning Research (JMLR), vol. 12, 2011.
- X. Yuan and T. Zhang, “Truncated power method for sparse eigenvalue problems,” Journal of Machine Learning Research (JMLR), vol. 14, pp. 899 – 925, 2013.
- W. Johnson and J. Lindenstrauss, “Extensions of lipschitz mappings into a hilbert space,” in Conference in Modern Analysis and Probability, 1984.
- S. Vempala, “The random projection method,” in Americal Mathematical Society, 2004.
- K. Soomro, A. Zamir, and M. Shah, “Ucf 101: A dataset of 101 human action classes from videos in the wild,” in Techical Report, UCF, 2012.
- W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman, “The kinetics human action video dataset,” in arXiv:1705.06950, 2017.
- Debanjan Goswami (1 paper)
- Shayok Chakraborty (6 papers)