Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond (2401.01219v2)
Abstract: Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlapping annotations, or when there is big discrepancy in the size of labeled data per task. We explore task-relatedness for co-annotation and co-training, and propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching. To demonstrate the general applicability of our method, we conducted diverse case studies in the domains of affective computing, face recognition, species recognition, and shopping item classification using nine datasets. Our large-scale study of affective tasks for basic expression recognition and facial action unit detection illustrates that our approach is network agnostic and brings large performance improvements compared to the state-of-the-art in both tasks and across all studied databases. In all case studies, we show that co-training via task-relatedness is advantageous and prevents negative transfer (which occurs when MT model's performance is worse than that of at least one single-task model).
- EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR’16). Las Vegas, NV, USA.
- Describing clothing by semantic attributes. In European conference on computer vision, 609–623. Springer.
- Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning, 794–803. PMLR.
- Multitask emotion recognition with incomplete labels. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG), 828–835. IEEE Computer Society.
- Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5203–5212.
- Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15): E1454–E1462.
- Ekman, R. 1997. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA.
- Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5562–5570.
- Distilling the knowledge in a neural network. arXiv:1503.02531.
- Learning from Synthetic Data: Facial Expression Classification based on Ensemble of Multi-task Networks. arXiv preprint arXiv:2207.10025.
- Do deep neural networks learn facial action units when doing expression recognition? In Proceedings of the IEEE International Conference on Computer Vision Workshops, 19–27.
- Kokkinos, I. 2017. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6129–6138.
- Kollias, D. 2022a. ABAW: learning from synthetic data & multi-task learning challenges. In European Conference on Computer Vision, 157–172. Springer.
- Kollias, D. 2022b. Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2328–2336.
- Deep neural network augmentation: Generating faces for affect analysis. International Journal of Computer Vision, 128(5): 1455–1484.
- Recognition of affect in the wild using deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 26–33.
- Analysing affective behavior in the first abaw 2020 competition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 637–643. IEEE.
- Abaw: Valence-arousal estimation, expression recognition, action unit detection & emotional reaction intensity estimation challenges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5888–5897.
- Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. International Journal of Computer Vision, 1–23.
- Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace. arXiv preprint arXiv:1910.04855.
- Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework. arXiv preprint arXiv:2103.15792.
- Analysing affective behavior in the second abaw2 competition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3652–3660.
- Whittlesearch: Image search with relative attribute feedback. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2973–2980. IEEE.
- Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2852–2861.
- Pareto Multi-Task Learning. In Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019).
- Loss-balanced task weighting to reduce negative transfer in multi-task learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 9977–9978.
- Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).
- Affectnet: A database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985.
- Mixaugment & mixup: Augmentation methods for facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2367–2375.
- An all-in-one convolutional neural network for face analysis. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 17–24. IEEE.
- In search of a robust facial expressions recognition model: A large-scale visual cross-corpus study. Neurocomputing, 514: 435–450.
- Savchenko, A. V. 2021. Facial expression and attributes recognition based on multi-task learning of lightweight neural networks. In 2021 IEEE 19th International Symposium on Intelligent Systems and Informatics (SISY), 119–124. IEEE.
- Savchenko, A. V. 2022. HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images. arXiv preprint arXiv:2207.09508.
- Multi-task learning as multi-objective optimization. arXiv preprint arXiv:1810.04650.
- Many task learning with task routing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1375–1384.
- Adashare: Learning what to share for efficient deep multi-task learning. Advances in Neural Information Processing Systems, 33: 8728–8740.
- Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access, 8: 131988–132001.
- The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology.
- Holistic 3d scene understanding from a single geo-tagged image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3964–3972.
- Expression-assisted facial action unit recognition under incomplete AU annotation. Pattern Recognition, 61: 78–91.
- Characterizing and avoiding negative transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11293–11302.
- Multi-task deep neural network for joint face recognition and facial attribute prediction. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 365–374. ACM.
- Understanding and Improving Information Transfer in Multi-Task Learning. In International Conference on Learning Representations.
- Multiple facial action unit recognition enhanced by facial expressions. In 2016 23rd International Conference on Pattern Recognition (ICPR), 4089–4094. IEEE.
- Learning to navigate for fine-grained classification. In Proceedings of the European conference on computer vision (ECCV), 420–435.
- Gradient Surgery for Multi-Task Learning. Advances in Neural Information Processing Systems, 33.
- Aff-wild: Valence and arousal ‘in-the-wild’challenge. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, 1980–1987. IEEE.
- Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3712–3722.
- Transformer-based Multimodal Information Fusion for Facial Expression Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2428–2437.
- Dimitrios Kollias (48 papers)
- Viktoriia Sharmanska (19 papers)
- Stefanos Zafeiriou (137 papers)