Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective (2211.08073v4)

Published 15 Nov 2022 in cs.CL, cs.AI, cs.LG, and cs.PF

Abstract: Pre-trained LLMs (PLMs) are known to improve the generalization performance of natural language understanding models by leveraging large amounts of data during the pre-training phase. However, the out-of-distribution (OOD) generalization problem remains a challenge in many NLP tasks, limiting the real-world deployment of these methods. This paper presents the first attempt at creating a unified benchmark named GLUE-X for evaluating OOD robustness in NLP models, highlighting the importance of OOD robustness and providing insights on how to measure the robustness of a model and how to improve it. The benchmark includes 13 publicly available datasets for OOD testing, and evaluations are conducted on 8 classic NLP tasks over 21 popularly used PLMs, including GPT-3 and GPT-3.5. Our findings confirm the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Linyi Yang (52 papers)
  2. Shuibai Zhang (4 papers)
  3. Libo Qin (77 papers)
  4. Yafu Li (26 papers)
  5. Yidong Wang (43 papers)
  6. Hanmeng Liu (11 papers)
  7. Jindong Wang (150 papers)
  8. Xing Xie (220 papers)
  9. Yue Zhang (620 papers)
Citations (69)

Summary

We haven't generated a summary for this paper yet.