Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mixture Data for Training Cannot Ensure Out-of-distribution Generalization (2312.16243v4)

Published 25 Dec 2023 in cs.LG

Abstract: Deep neural networks often face generalization problems to handle out-of-distribution (OOD) data, and there remains a notable theoretical gap between the contributing factors and their respective impacts. Literature evidence from in-distribution data has suggested that generalization error can shrink if the size of mixture data for training increases. However, when it comes to OOD samples, this conventional understanding does not hold anymore -- Increasing the size of training data does not always lead to a reduction in the test generalization error. In fact, diverse trends of the errors have been found across various shifting scenarios including those decreasing trends under a power-law pattern, initial declines followed by increases, or continuous stable patterns. Previous work has approached OOD data qualitatively, treating them merely as samples unseen during training, which are hard to explain the complicated non-monotonic trends. In this work, we quantitatively redefine OOD data as those situated outside the convex hull of mixed training data and establish novel generalization error bounds to comprehend the counterintuitive observations better. Our proof of the new risk bound agrees that the efficacy of well-trained models can be guaranteed for unseen data within the convex hull; More interestingly, but for OOD data beyond this coverage, the generalization cannot be ensured, which aligns with our observations. Furthermore, we attempted various OOD techniques to underscore that our results not only explain insightful observations in recent OOD generalization work, such as the significance of diverse data and the sensitivity to unseen shifts of existing algorithms, but it also inspires a novel and effective data selection strategy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Generalizing to unseen domains via distribution matching. arXiv preprint arXiv:1911.00804, 2019.
  2. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  3. A theory of learning from different domains. Machine learning, 79:151–175, 2010.
  4. What is the effect of importance weighting in deep learning? In International conference on machine learning, pages 872–881. PMLR, 2019.
  5. Pareto invariant risk minimization. arXiv preprint arXiv:2206.07766, 2022.
  6. The value of out-of-distribution data. In International Conference on Machine Learning, pages 7366–7389. PMLR, 2023.
  7. Can autonomous vehicles identify, recover from, and adapt to distribution shifts? In International Conference on Machine Learning, pages 3145–3153. PMLR, 2020.
  8. Domain generalization for object recognition with multi-task autoencoders. In Proceedings of the IEEE international conference on computer vision, pages 2551–2559, 2015.
  9. In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020a.
  10. In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020b.
  11. Domain generalization via multidomain discriminant analysis. In Uncertainty in Artificial Intelligence, pages 292–302. PMLR, 2020.
  12. Style neophile: Constantly seeking novel styles for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7130–7140, 2022.
  13. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  14. Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pages 5542–5550, 2017.
  15. Heterogeneous risk minimization. In International Conference on Machine Learning, pages 6804–6814. PMLR, 2021.
  16. Domain adaptive transfer learning with specialist models. arXiv preprint arXiv:1811.07056, 2018.
  17. Generating furry cars: Disentangling object shape & appearance across multiple domains. arXiv preprint arXiv:2104.02052, 2021.
  18. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607, 2021.
  19. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
  20. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. Advances in Neural Information Processing Systems, 35:19304–19318, 2022.
  21. Generalizing across domains via cross-gradient training. arXiv preprint arXiv:1804.10745, 2018.
  22. Synthetic prompting: Generating chain-of-thought demonstrations for large language models. arXiv preprint arXiv:2302.00618, 2023.
  23. Towards out-of-distribution generalization: A survey. arXiv preprint arXiv:2108.13624, 2021.
  24. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
  25. The science of detecting llm-generated texts. arXiv preprint arXiv:2303.07205, 2023.
  26. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159, 2018.
  27. Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering, 2022.
  28. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  29. Linear discriminant analysis. Robust data mining, pages 27–33, 2013.
  30. A fourier-based framework for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14383–14392, 2021.
  31. Change is hard: A closer look at subpopulation shift. arXiv preprint arXiv:2302.12254, 2023.
  32. Improving out-of-distribution robustness via selective augmentation. In International Conference on Machine Learning, pages 25407–25437. PMLR, 2022.
  33. Ood-bench: Quantifying and understanding two dimensions of out-of-distribution generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7947–7958, 2022.
  34. Data valuation using reinforcement learning. In International Conference on Machine Learning, pages 10842–10851. PMLR, 2020.
  35. Can data diversity enhance learning generalization? In Proceedings of the 29th international conference on computational linguistics, pages 4933–4945, 2022.
  36. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  37. Continual learning through synaptic intelligence. In International conference on machine learning, pages 3987–3995. PMLR, 2017.
  38. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  39. Towards principled disentanglement for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8024–8034, 2022a.
  40. Deep stable learning for out-of-distribution generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5372–5382, 2021.
  41. Exact feature distribution matching for arbitrary style transfer and domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8035–8045, 2022b.
  42. Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008, 2021.
  43. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  44. Xtab: Cross-table pretraining for tabular transformers. arXiv preprint arXiv:2305.06090, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Songming Zhang (13 papers)
  2. Yuxiao Luo (6 papers)
  3. Qizhou Wang (26 papers)
  4. Haoang Chi (7 papers)
  5. Bo Han (282 papers)
  6. Jinyan Li (5 papers)
  7. Xiaofeng Chen (12 papers)

Summary

We haven't generated a summary for this paper yet.