Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improved OOD Generalization via Adversarial Training and Pre-training (2105.11144v1)

Published 24 May 2021 in cs.LG

Abstract: Recently, learning a model that generalizes well on out-of-distribution (OOD) data has attracted great attention in the machine learning community. In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data. Inspired by previous findings that adversarial training helps improve input-robustness, we theoretically show that adversarially trained models have converged excess risk on OOD data, and empirically verify it on both image classification and natural language understanding tasks. Besides, in the paradigm of first pre-training and then fine-tuning, we theoretically show that a pre-trained model that is more robust to input perturbation provides a better initialization for generalization on downstream OOD data. Empirically, after fine-tuning, this better-initialized model from adversarial pre-training also has better OOD generalization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Mingyang Yi (19 papers)
  2. Lu Hou (50 papers)
  3. Jiacheng Sun (49 papers)
  4. Lifeng Shang (90 papers)
  5. Xin Jiang (243 papers)
  6. Qun Liu (231 papers)
  7. Zhi-Ming Ma (56 papers)
Citations (75)

Summary

We haven't generated a summary for this paper yet.