Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Wild-Tab: A Benchmark For Out-Of-Distribution Generalization In Tabular Regression (2312.01792v1)

Published 4 Dec 2023 in cs.LG

Abstract: Out-of-Distribution (OOD) generalization, a cornerstone for building robust machine learning models capable of handling data diverging from the training set's distribution, is an ongoing challenge in deep learning. While significant progress has been observed in computer vision and natural language processing, its exploration in tabular data, ubiquitous in many industrial applications, remains nascent. To bridge this gap, we present Wild-Tab, a large-scale benchmark tailored for OOD generalization in tabular regression tasks. The benchmark incorporates 3 industrial datasets sourced from fields like weather prediction and power consumption estimation, providing a challenging testbed for evaluating OOD performance under real-world conditions. Our extensive experiments, evaluating 10 distinct OOD generalization methods on Wild-Tab, reveal nuanced insights. We observe that many of these methods often struggle to maintain high-performance levels on unseen data, with OOD performance showing a marked drop compared to in-distribution performance. At the same time, Empirical Risk Minimization (ERM), despite its simplicity, delivers robust performance across all evaluations, rivaling the results of state-of-the-art methods. Looking forward, we hope that the release of Wild-Tab will facilitate further research on OOD generalization and aid in the deployment of machine learning models in various real-world contexts where handling distribution shifts is a crucial requirement.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Invariance principle meets information bottleneck for out-of-distribution generalization. 2021.
  2. Palm 2 technical report, 2023.
  3. Tabnet: Attentive interpretable tabular learning, 2020.
  4. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  5. Self-driving cars: A survey, 2019.
  6. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440–447, 2007.
  7. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, PP:1–21, 12 2022.
  8. L. Bruzzone and M. Marconcini. Domain adaptation problems: A DASVM classification technique and a circular validation strategy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5):770–787, 2009.
  9. Domain generalization by mutual-information regularization with pre-trained models, 2022.
  10. Xgboost: A scalable tree boosting system. In SIGKDD, 2016.
  11. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS 2016, page 7–10, New York, NY, USA, 2016. Association for Computing Machinery.
  12. Invariant causal mechanisms through distribution matching, 2022.
  13. Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface, 15:20170387, 04 2018.
  14. Palm: Scaling language modeling with pathways, 2022.
  15. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, page 191–198, New York, NY, USA, 2016. Association for Computing Machinery.
  16. Show your work: Improved reporting of experimental results. In Proceedings of EMNLP, 2019.
  17. Probable Domain Generalization via Quantile Risk Minimization, January 2023.
  18. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In Proceedings of the IEEE International Conference on Computer Vision, pages 1657–1664, 2013.
  19. Domain adaptation for sentiment classification in light of multiple sources. INFORMS Journal on Computing, 26(3):586–598, 2014.
  20. Woods: Benchmarks for out-of-distribution generalization in time series tasks, 2022.
  21. Domain-adversarial training of neural networks, 2016.
  22. On embeddings for numerical features in tabular deep learning. arXiv preprint arXiv:2203.05556, 2022.
  23. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34, 2021.
  24. In search of lost domain generalization. ArXiv, abs/2007.01434, 2020.
  25. David J Hand. Classifier technology and the illusion of progress. Statistical science, pages 1–14, 2006.
  26. Towards non-iid image classification: A dataset and baselines. Pattern Recognition, 110:107383, 2021.
  27. Deep learning in finance, 2018.
  28. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
  29. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
  30. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021.
  31. Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning (ICML), 2018.
  32. Deep learning in finance and banking: A literature review and classification. Frontiers of Business Research in China, 14, 12 2020.
  33. Computer vision for autonomous vehicles: Problems, datasets and state of the art, 2021.
  34. WILDS: A benchmark of in-the-wild distribution shifts. arXiv, 2020.
  35. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
  36. Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning, pages 5815–5826. PMLR, 2021.
  37. Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pages 5542–5550, 2017.
  38. Metashift: A dataset of datasets for evaluating contextual distribution shifts and training conflicts. arXiv preprint arXiv:2202.06523, 2022.
  39. Decoupled weight decay regularization. In ICLR, 2019.
  40. Shifts 2.0: Extending the dataset of real distributional shifts, 2022.
  41. Shifts: A dataset of real distributional shift across multiple large-scale tasks. In NeurIPS Dataset & Benchmark Track, 2021.
  42. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19:313–330, 1993.
  43. Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics, 19, 05 2017.
  44. OpenAI. Gpt-4 technical report, 2023.
  45. Deep learning for financial applications : A survey. Applied Soft Computing, page 106384, 05 2020.
  46. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
  47. Catboost: unbiased boosting with categorical features. In NeurIPS, 2018.
  48. Dataset shift in machine learning. The MIT Press, 2009.
  49. Learning transferable visual models from natural language supervision, 2021.
  50. Do cifar-10 classifiers generalize to cifar-10? arXiv preprint arXiv:1806.00451, 2018.
  51. High-resolution image synthesis with latent diffusion models, 2022.
  52. Revisiting pretraining objectives for tabular deep learning, 2022.
  53. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226. Springer, 2010.
  54. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In International Conference on Learning Representations (ICLR), 2020.
  55. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In ICLR, 2020.
  56. Breeds: Benchmarks for subpopulation shift. arXiv preprint arXiv:2008.04859, 2020.
  57. Deep EHR: A survey of recent advances on deep learning techniques for electronic health record (EHR) analysis. CoRR, abs/1706.03446, 2017.
  58. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pages 443–450. Springer, 2016.
  59. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5018–5027, 2017.
  60. On the robustness of chatgpt: An adversarial and out-of-distribution perspective, 2023.
  61. Generalizing to unseen domains: A survey on domain generalization, 2022.
  62. Wild-time: A benchmark of in-the-wild distribution shift over time. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  63. Adversarial style augmentation for domain generalized urban-scene segmentation, 2022.
  64. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20, 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.