Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding (2110.08419v2)

Published 16 Oct 2021 in cs.CL and cs.LG

Abstract: Recent work has focused on compressing pre-trained LLMs (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks. However, very few of these studies have analyzed the impact of compression on the generalizability and robustness of compressed models for out-of-distribution (OOD) data. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the shortcut samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for robust model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate that our bias mitigation framework improves the OOD generalization of the compressed models, while not sacrificing the in-distribution task performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mengnan Du (90 papers)
  2. Subhabrata Mukherjee (59 papers)
  3. Yu Cheng (354 papers)
  4. Milad Shokouhi (14 papers)
  5. Xia Hu (186 papers)
  6. Ahmed Hassan Awadallah (50 papers)
Citations (11)