Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection (2307.03377v1)
Abstract: This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, making this solution unfeasible in cases where data are unavailable or expensive to gather. Therefore another solution, based on the sharing of information between tasks, has been developed: Multi-Task Learning (MTL). Despite the recent developments regarding MTL, the problem of negative transfer has still to be solved. Negative transfer is a phenomenon that occurs when noisy information is shared between tasks, resulting in a drop in performance. This paper proposes a new approach to mitigate the negative transfer problem based on the task awareness concept. The proposed approach results in diminishing the negative transfer together with an improvement of performance over classic MTL solution. Moreover, the proposed approach has been implemented in two unified architectures to detect Sexism, Hate Speech, and Toxic Language in text comments. The proposed architectures set a new state-of-the-art both in EXIST-2021 and HatEval-2019 benchmarks.
- “Semi-supervised Multi-task Learning for Multi-label Fine-grained Sexism Classification” In Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 5810–5820
- “SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter” In Proceedings of the 13th International Workshop on Semantic Evaluation, 2019, pp. 54–63
- David M Blei, Andrew Y Ng and Michael I Jordan “Latent Dirichlet Allocation” In Journal of Machine Learning Research 3.Jan, 2003, pp. 993–1022
- Wayne D. Blizard “Multiset Theory” In Notre Dame Journal of Formal Logic 30.1 Duke University Press, 1988, pp. 36–66
- “Enriching Word Vectors with Subword Information” In Transactions of the Association for Computational Linguistics 5 MIT Press, 2017, pp. 135–146
- Bernhard E Boser, Isabelle M Guyon and Vladimir N Vapnik “A Training Algorithm for Optimal Margin Classifiers” In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 1992, pp. 144–152
- “Spanish Pre-trained Bert Model and Evaluation Data” In Practical Machine Learning for Developing Countries (PML4DC) at Eleventh International Conference on Learning Representations (ICLR) 2020, 2020, pp. 1–10
- Rich Caruana, Steve Lawrence and C Giles “Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping” In Advances in Neural Information Processing Systems 13, 2000, pp. 381–387
- “GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks” In Proc. ICML PMLR, 2018, pp. 794–803
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171–4186
- Lixin Duan, Dong Xu and Ivor W. Tsang “Learning with Augmented Features for Heterogeneous Domain Adaptation” In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML ’12) Omnipress, 2012, pp. 667–674
- “A Comprehensive Survey of Clustering Algorithms: State-of-the-art Machine Learning Applications, Taxonomy, Challenges, and Future Research Prospects” In Engineering Applications of Artificial Intelligence 110 Elsevier, 2022, pp. 104743
- Lingyong Fang, Gongshen Liu and Ru Zhang “Sense-aware BERT and Multi-task Fine-tuning for Multimodal Sentiment Analysis” In Proc. IJCNN, 2022, pp. 1–8
- “Compressed Hierarchical Representations for Multi-task Learning and Task Clustering” In Proc. IJCNN, 2022, pp. 01–08
- “Dynamic Task Prioritization for Multitask Learning” In Proc. ECCV, 2018, pp. 270–287
- “Task Aware Multi-Task Learning for Speech to Text Tasks” In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7723–7727
- Alex Kendall, Yarin Gal and Roberto Cipolla “Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7482–7491
- Brian Kulis, Kate Saenko and Trevor Darrell “What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms” In CVPR 2011, 2011, pp. 1785–1792 IEEE
- Ivano Lauriola, Alberto Lavelli and Fabio Aiolli “An Introduction to Deep Learning in Natural Language Processing: Models, Techniques, and Tools” In Neurocomputing 470 Elsevier, 2022, pp. 443–456
- “Learning Multiple Tasks with Multilinear Relationship Networks” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 1593–1602
- “Decoupled Weight Decay Regularization” In Proc. ICLR, 2019
- “Fully-adaptive Feature Sharing in Multi-task Networks with Applications in Person Attribute Classification” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5334–5343
- Angel Felipe Magnossão de Paula, Roberto Fray Silva and Ipek Baris Schlicht “Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models” In Proc. IberLEF’21, 2021, pp. 356–373
- “Efficient Estimation of Word Representations in Vector Space” In Proc. ICLR, 2013
- Dubravko Miljković “Brief Review of Self-organizing Maps” In 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2017, pp. 1061–1066
- Daniel W Otter, Julian R Medina and Jugal K Kalita “A Survey of the Usages of Deep Learning for Natural Language Processing” In IEEE Transactions on Neural Networks and Learning Systems 32.2 IEEE, 2020, pp. 604–624
- Sinno Jialin Pan and Qiang Yang “A Survey on Transfer Learning” In IEEE Transactions on Knowledge and Data Engineering 22.10 IEEE, 2009, pp. 1345–1359
- Juan Manuel Pérez and Franco M. Luque “Atalaya at SemEval 2019 Task 5: Robust Embeddings for Tweet Classification” In Proceedings of the 13th International Workshop on Semantic Evaluation Association for Computational Linguistics, 2019, pp. 64–69
- “Overview of EXIST 2023: sEXism Identification in Social NeTworks” In Proc. ECIR Springer Nature Switzerland, 2023, pp. 593–599
- Flor Miriam Plaza-del-Arco, M Dolores Molina-González and L Alfonso “SINAI at IberLEF-2021 DETOXIS Task: Exploring Features as Tasks in a Multi-task Learning Approach to Detecting Toxic Comments” In Proc. IberLEF’21, 2021, pp. 580–590
- “Cross-language Text Classification Using Structural Correspondence Learning” In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, pp. 1118–1127
- “Overview of EXIST 2021: sEXism Identification in Social neTworks” In Procesamiento del Lenguaje Natural 67, 2021, pp. 195–207
- Sebastian Ruder “An Overview of Multi-Task Learning in Deep Neural Networks” In CoRR abs/1706.05098, 2017 arXiv:1706.05098
- “Transfer Learning in Natural Language Processing” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, 2019, pp. 15–18
- “Multi-Task Learning as Multi-Objective Optimization” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 525–536
- “Gradient Adversarial Training of Neural Networks” US Patent App. 17/051,982 Google Patents, 2021
- “Overview of DETOXIS at IberLEF 2021: DEtection of TOxicity in comments In Spanish” In Procesamiento del Lenguaje Natural 67, 2021, pp. 209–221
- “Branched Multi-task Networks: Deciding What Layers to Share” In Proceedings of the 31st British Machine Vision Conference (BMVC ’20) BMVA Press, 2020
- “Multi-task Learning for Dense Prediction Tasks: A Survey”, 2022, pp. 3614–3633
- “Attention is All You Need” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6000–6010
- “Deep Learning for Computer Vision: A Brief Review” In Computational Intelligence and Neuroscience 2018 London, GBR: Hindawi Limited, 2018
- “Heterogeneous Domain Adaptation Using Manifold Alignment” In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Two, 2011, pp. 1541–1546
- “Exploring Topic Supervision with BERT for Text Matching” In Proc. IJCNN, 2022, pp. 1–7
- Karl Weiss, Taghi M Khoshgoftaar and DingDing Wang “A Survey of Transfer Learning” In Journal of Big Data 3.1 SpringerOpen, 2016, pp. 1–40
- “Multi-task Learning for Natural Language Processing in the 2020s: Where are We Going?” In Pattern Recognition Letters 136, 2020, pp. 120–126
- Sen Wu, Hongyang R. Zhang and Christopher Ré “Understanding and Improving Information Transfer in Multi-task Learning” In Proc. ICLR, 2020
- Shengqiong Wu, Hao Fei and Donghong Ji “Aggressive Language Detection with Joint Text Normalization via Adversarial Multi-task Learning” In CCF International Conference on Natural Language Processing and Chinese Computing, 2020, pp. 683–696
- “Pad-net: Multi-tasks Guided Prediction-and-distillation Network for Simultaneous Depth Estimation and Scene Parsing” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 675–684
- “A Survey on Multi-task Learning” In IEEE Transactions on Knowledge and Data Engineering 34.12, 2022, pp. 5586–5609
- “Joint Task-recursive Learning for Semantic Segmentation and Depth Estimation” In Proc. ECCV, 2018, pp. 235–251
- “Pattern-affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4106–4115
- “A Modulation Module for Multi-task Learning with Applications in Image Retrieval” In Proc. ECCV, 2018, pp. 401–416
- Angel Felipe Magnossão de Paula (10 papers)
- Paolo Rosso (41 papers)
- Damiano Spina (29 papers)