Papers
Topics
Authors
Recent
2000 character limit reached

Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding (2301.03765v2)

Published 10 Jan 2023 in cs.CL, cs.IR, and cs.LG

Abstract: Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 5 widely used pretrained LLMs and find it particularly superior for models with few parameters or long input.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. R. Attar and A. S. Fraenkel. 1977. Local Feedback in Full-Text Retrieval Systems. J. ACM 24, 3 (July 1977), 397–417. https://doi.org/10.1145/322017.322021
  2. The Generalization-Stability Tradeoff In Neural Network Pruning. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 20852–20864. https://proceedings.neurips.cc/paper/2020/hash/ef2ee09ea9551de88bc11fd7eeea93b0-Abstract.html
  3. Longformer: The Long-Document Transformer. https://doi.org/10.48550/arXiv.2004.05150 arXiv:2004.05150
  4. The Fifth PASCAL Recognizing Textual Entailment Challenge.. In TAC.
  5. What Is the State of Neural Network Pruning? Proceedings of Machine Learning and Systems 2 (March 2020), 129–146. https://proceedings.mlsys.org/paper/2020/hash/d2ddea18f00665ce8623e36bd4e3c7c5-Abstract.html
  6. Improving Language Models by Retrieving from Trillions of Tokens. In Proceedings of the 39th International Conference on Machine Learning. PMLR, 2206–2240. https://proceedings.mlr.press/v162/borgeaud22a.html
  7. Language Models Are Few-Shot Learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., Red Hook, NY, USA, 1877–1901. https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
  8. Alon Brutzkus and Amir Globerson. 2019. Why Do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 822–830. https://proceedings.mlr.press/v97/brutzkus19b.html
  9. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, 1–14. https://doi.org/10.18653/v1/S17-2001
  10. Yi Chang and Hongbo Deng. 2020. Query understanding for search engines. Springer.
  11. Danqi Chen. 2018. Neural Reading Comprehension and Beyond. Stanford University.
  12. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning. PMLR, 1597–1607. https://proceedings.mlr.press/v119/chen20j.html
  13. Quora question pairs. https://www.kaggle.com/c/quora-question-pairs
  14. Christopher Clark and Matt Gardner. 2018. Simple and Effective Multi-Paragraph Reading Comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 845–855. https://doi.org/10.18653/v1/P18-1078
  15. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In International Conference on Learning Representations. https://openreview.net/forum?id=r1xMH1BtvB
  16. Stéphane Clinchant and Eric Gaussier. 2013. A Theoretical Analysis of Pseudo-Relevance Feedback Models. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR ’13). Association for Computing Machinery, New York, NY, USA, 6–13. https://doi.org/10.1145/2499178.2499179
  17. Paul R. Cohen and Adele E. Howe. 1988. How Evaluation Guides AI Research: The Message Still Counts More than the Medium. AI Magazine 9, 4 (Dec. 1988), 35–35. https://doi.org/10.1609/aimag.v9i4.952
  18. Kevyn Collins-Thompson. 2009. Reducing the Risk of Query Expansion via Robust Constrained Optimization. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM ’09). Association for Computing Machinery, New York, NY, USA, 837–846. https://doi.org/10.1145/1645953.1646059
  19. Overview of the TREC 2020 Deep Learning Track. arXiv:2102.07662 [cs] (Feb. 2021). arXiv:2102.07662 http://arxiv.org/abs/2102.07662
  20. Overview of the TREC 2019 Deep Learning Track. arXiv:2003.07820 [cs] (March 2020). arXiv:2003.07820 http://arxiv.org/abs/2003.07820
  21. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  22. William B. Dolan and Chris Brockett. 2005. Automatically Constructing a Corpus of Sentential Paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005). https://aclanthology.org/I05-5002
  23. Generative Context Pair Selection for Multi-hop Question Answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 7009–7015. https://doi.org/10.18653/v1/2021.emnlp-main.561
  24. Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 9847–9857. https://doi.org/10.18653/v1/2021.emnlp-main.775
  25. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 6894–6910. https://doi.org/10.18653/v1/2021.emnlp-main.552
  26. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, 1735–1742.
  27. Learning Both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc. https://papers.nips.cc/paper/2015/hash/ae0eb3eed39d2bcef4622b2499a05fe6-Abstract.html
  28. Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
  29. Large Margin Rank Boundaries for Ordinal Regression. Advances in large margin classifiers 88, 2 (2000), 115–132.
  30. Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors. https://doi.org/10.48550/arXiv.1207.0580 arXiv:1207.0580
  31. L. Hirschman and R. Gaizauskas. 2001. Natural Language Question Answering: The View from Here. Natural Language Engineering 7, 4 (Dec. 2001), 275–300. https://doi.org/10.1017/S1351324901002807
  32. Few-Shot Learning with Retrieval Augmented Language Models. https://doi.org/10.48550/arXiv.2208.03299 arXiv:2208.03299
  33. Scaling Laws for Neural Language Models. https://doi.org/10.48550/arXiv.2001.08361 arXiv:2001.08361
  34. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
  35. Variational Dropout and the Local Reparameterization Trick. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc. https://papers.nips.cc/paper/2015/hash/bc7316929fe1545bf0b98d114ee3ecb8-Abstract.html
  36. Survey of Dropout Methods for Deep Neural Networks. https://doi.org/10.48550/arXiv.1904.13310 arXiv:1904.13310
  37. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations. https://openreview.net/forum?id=H1eA7AEtvS
  38. Contrastive Representation Learning: A Framework and Review. IEEE Access 8 (2020), 193907–193934. https://doi.org/10.1109/ACCESS.2020.3031549
  39. Optimal Brain Damage. In Advances in Neural Information Processing Systems, Vol. 2. Morgan-Kaufmann. https://papers.nips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html
  40. R-Drop: Regularized Dropout for Neural Networks. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 10890–10905. https://proceedings.neurips.cc/paper/2021/hash/5a66b9200f29ac3fa0ae244cc2a51b39-Abstract.html
  41. Jimmy Lin. 2002. The Web as a Resource for Question Answering: Perspectives and Challenges. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02). European Language Resources Association (ELRA), Las Palmas, Canary Islands - Spain. http://www.lrec-conf.org/proceedings/lrec2002/pdf/85.pdf
  42. Jimmy Lin and Xueguang Ma. 2021. A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques. arXiv:2106.14807 [cs] (June 2021). arXiv:2106.14807 http://arxiv.org/abs/2106.14807
  43. Neural Machine Reading Comprehension: Methods and Trends. Applied Sciences 9, 18 (Jan. 2019), 3698. https://doi.org/10.3390/app9183698
  44. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs] (July 2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
  45. Learning Efficient Convolutional Networks through Network Slimming. In 2017 IEEE International Conference on Computer Vision (ICCV). 2755–2763. https://doi.org/10.1109/ICCV.2017.298
  46. Dropout with expectation-linear regularization. arXiv preprint arXiv:1609.08017 (2016).
  47. How Deep Is Your Learning: The DL-HARD Annotated Deep Learning Dataset. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 2335–2341. https://doi.org/10.1145/3404835.3463262
  48. Ablation Studies in Artificial Neural Networks. https://doi.org/10.48550/arXiv.1901.08644 arXiv:1901.08644
  49. Efficient and Robust Question Answering from Minimal Context over Documents. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1725–1735. https://doi.org/10.18653/v1/P18-1160
  50. Improving Automatic Query Expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’98). Association for Computing Machinery, New York, NY, USA, 206–214. https://doi.org/10.1145/290941.290995
  51. MS MARCO: A Human Generated Machine Reading Comprehension Dataset. In CoCo@ NIPS.
  52. Top-k learning to rank: labeling, ranking and evaluation. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (Portland, Oregon, USA) (SIGIR ’12). Association for Computing Machinery, New York, NY, USA, 751–760. https://doi.org/10.1145/2348283.2348384
  53. Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 1396–1405. https://doi.org/10.1145/3459637.3482450
  54. HAS-QA: Hierarchical Answer Spans Model for Open-Domain Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (July 2019), 6875–6882. https://doi.org/10.1609/aaai.v33i01.33016875
  55. Text matching as image recognition. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (Phoenix, Arizona) (AAAI’16). AAAI Press, 2793–2799.
  56. SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 499–508. https://doi.org/10.1145/3397271.3401104
  57. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67. http://jmlr.org/papers/v21/20-074.html
  58. Know What You Don’t Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 784–789. https://doi.org/10.18653/v1/P18-2124
  59. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 2383–2392. https://doi.org/10.18653/v1/D16-1264
  60. Pedro Rodriguez and Jordan Boyd-Graber. 2021. Evaluation Paradigms in Question Answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 9630–9642. https://doi.org/10.18653/v1/2021.emnlp-main.758
  61. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1631–1642.
  62. Select, Answer and Explain: Interpretable Multi-Hop Reading Comprehension over Multiple Documents. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9073–9080. https://doi.org/10.1609/aaai.v34i05.6441
  63. Vladimir Vapnik. 1991. Principles of risk minimization for learning theory. Advances in neural information processing systems 4 (1991).
  64. Attention Is All You Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
  65. Regularization of Neural Networks Using DropConnect. In Proceedings of the 30th International Conference on Machine Learning. PMLR, 1058–1066. https://proceedings.mlr.press/v28/wan13.html
  66. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In International Conference on Learning Representations. https://openreview.net/forum?id=rJ4km2R5t7
  67. Training Data Is More Valuable than You Think: A Simple and Effective Method by Retrieving from Training Data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 3170–3179. https://doi.org/10.18653/v1/2022.acl-long.226
  68. Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. In Proceedings of the 37th International Conference on Machine Learning. PMLR, 9929–9939. https://proceedings.mlr.press/v119/wang20k.html
  69. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics 7 (2019), 625–641.
  70. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 1112–1122. https://doi.org/10.18653/v1/N18-1101
  71. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
  72. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In International Conference on Learning Representations. https://openreview.net/forum?id=zeFrfgyZln
  73. Match-Prompt: Improving Multi-task Generalization Ability for Neural Text Matching via Prompt Learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM ’22). Association for Computing Machinery, New York, NY, USA, 2290–2300. https://doi.org/10.1145/3511808.3557388
  74. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2369–2380. https://doi.org/10.18653/v1/D18-1259
  75. Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 3592–3596. https://doi.org/10.1145/3459637.3482124
  76. BERT-QE: Contextualized Query Expansion for Document Re-ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 4718–4728. https://doi.org/10.18653/v1/2020.findings-emnlp.424
  77. Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 3813–3827. https://doi.org/10.18653/v1/2021.findings-acl.334
  78. L2R22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Leveraging Ranking for Abductive Reasoning. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 1961–1964. https://doi.org/10.1145/3397271.3401332
  79. Adaptive Information Seeking for Open-Domain Question Answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 3615–3626. https://doi.org/10.18653/v1/2021.emnlp-main.293
  80. LoL: A Comparative Regularization Loss over Query Reformulation Losses for Pseudo-Relevance Feedback. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 825–836. https://doi.org/10.1145/3477495.3532017
  81. Liron Zighelnic and Oren Kurland. 2008. Query-Drift Prevention for Robust Query Expansion. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’08). Association for Computing Machinery, New York, NY, USA, 825–826. https://doi.org/10.1145/1390334.1390524
  82. Fraternal dropout. arXiv preprint arXiv:1711.00066 (2017).
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.