Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Model Compression Also Helps Models Capture Ambiguity (2306.07061v1)

Published 12 Jun 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Natural language understanding (NLU) tasks face a non-trivial amount of ambiguous samples where veracity of their labels is debatable among annotators. NLU models should thus account for such ambiguity, but they approximate the human opinion distributions quite poorly and tend to produce over-confident predictions. To address this problem, we must consider how to exactly capture the degree of relationship between each sample and its candidate classes. In this work, we propose a novel method with deep model compression and show how such relationship can be accounted for. We see that more reasonably represented relationships can be discovered in the lower layers and that validation accuracies are converging at these layers, which naturally leads to layer pruning. We also see that distilling the relationship knowledge from a lower layer helps models produce better distribution. Experimental results demonstrate that our method makes substantial improvement on quantifying ambiguity without gold distribution labels. As positive side-effects, our method is found to reduce the model size significantly and improve latency, both attractive aspects of NLU products.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
  2. GoEmotions: A dataset of fine-grained emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4040–4054, Online. Association for Computational Linguistics.
  3. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  4. Dominik Maria Endres and Johannes E. Schindelin. 2003. A new metric for probability distributions. IEEE Transactions on Information Theory, 49:1858–1860.
  5. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on Machine Learning, pages 1050–1059.
  6. On calibration of modern neural networks. In Proceedings of 34th International Conference on Machine Learning, pages 1321–1330.
  7. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
  8. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  9. Rosita: Refined bert compression with integrated techniques. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8715–8722.
  10. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
  11. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of Machine Learning Research, 9:2579–2605.
  12. Embracing ambiguity: Shifting the training target of NLI models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 862–869, Online. Association for Computational Linguistics.
  13. Saif Mohammad and Felipe Bravo-Marquez. 2017. Emotion intensities in tweets. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), pages 65–77, Vancouver, Canada. Association for Computational Linguistics.
  14. When does label smoothing help? In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pages 4694–4703.
  15. What can we learn from collective human opinions on natural language inference data? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9131–9143, Online. Association for Computational Linguistics.
  16. Ellie Pavlick and Tom Kwiatkowski. 2019. Inherent disagreements in human textual inferences. Transactions of the Association for Computational Linguistics, 7:677–694.
  17. To tune or not to tune? adapting pretrained representations to diverse tasks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 7–14, Florence, Italy. Association for Computational Linguistics.
  18. On the effect of dropping layers of pre-trained transformer models. Computer Speech and Language, 77.
  19. Carlo Strapparava and Rada Mihalcea. 2007. SemEval-2007 task 14: Affective text. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 70–74, Prague, Czech Republic. Association for Computational Linguistics.
  20. Learning from disagreement: A survey. Journal of Artificial Intelligence Research, 72:1385–1470.
  21. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 6000–6010.
  22. Capture human disagreement distributions by calibrated networks for natural language inference. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1524–1535, Dublin, Ireland. Association for Computational Linguistics.
  23. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
  24. Learning with different amounts of annotation: From zero to many labels. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7620–7632, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  25. Text emotion distribution learning via multi-task convolutional neural network. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pages 4595–4601.
  26. Bert loses patience: fast and robust inference with early exit. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pages 18330–18341.
  27. Distributed NLI: Learning to predict human opinion distributions for language reasoning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 972–987, Dublin, Ireland. Association for Computational Linguistics.
Citations (1)

Summary

We haven't generated a summary for this paper yet.