Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks (2402.11137v3)

Published 17 Feb 2024 in cs.LG

Abstract: While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to LLMs, PFNs make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass. However, current PFNs have limitations that prohibit their widespread adoption. Notably, TabPFN achieves very strong performance on small tabular datasets but is not designed to make predictions for datasets of size larger than 1000. In this work, we overcome these limitations and substantially improve the performance of PFNs via context optimization. We introduce TuneTables, a parameter-efficient fine-tuning strategy for PFNs that compresses large datasets into a smaller learned context. We conduct extensive experiments on 19 algorithms over 98 datasets and find that TuneTables achieves the best performance on average, outperforming boosted trees such as CatBoost, while optimizing fewer than 5% of TabPFN's parameters. Furthermore, we show that TuneTables can be used as an interpretability tool and can even be used to mitigate biases by optimizing a fairness objective. We open-source our code and raw results at https://github.com/penfever/TuneTables.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Efficient bayesian learning curve extrapolation using prior-data fitted networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2023.
  2. Geometric approximation via coresets. Combinatorial and computational geometry, 2005.
  3. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics, 2020.
  4. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019.
  5. Machine bias. In Ethics of data and analytics, pages 254–264. Auerbach Publications, 2022.
  6. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2021.
  7. Loan approval prediction based on machine learning approach. IOSR J. Comput. Eng, 18(3):18–21, 2016.
  8. Uci machine learning repository, 2007.
  9. Fairness and machine learning: Limitations and opportunities. MIT Press, 2023.
  10. Deep neural networks and tabular data: A survey. arXiv preprint arXiv:2110.01889, 2021.
  11. A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications surveys & tutorials, 18(2):1153–1176, 2015.
  12. Pierluigi Casale. User Identification From Walking Activity. UCI Machine Learning Repository, 2014. DOI: https://doi.org/10.24432/C5WC8S.
  13. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1–58, 2009.
  14. A survey on feature selection methods. Computers & Electrical Engineering, 2014.
  15. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
  16. A performance-driven benchmark for feature selection in tabular deep learning. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  17. Sequential deep learning for credit risk monitoring with tabular financial data. arXiv preprint arXiv:2012.15330, 2020.
  18. William Jay Conover. Practical nonparametric statistics, volume 350. john wiley & sons, 1999.
  19. Qlora: Efficient finetuning of quantized llms. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2023.
  20. Forecastpfn: Synthetically-trained zero-shot forecasting. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2023.
  21. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, 2012.
  22. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  23. Milton Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association, 1937.
  24. Revisiting deep learning models for tabular data. Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2021.
  25. Why do tree-based models still outperform deep learning on typical tabular data? In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  26. Deepfm: a factorization-machine based neural network for ctr prediction. In IJCAI, 2017.
  27. Tabpfn: A transformer that solves small tabular classification problems in a second. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
  28. Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pages 65–70, 1979.
  29. Lora: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
  30. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678, 2020.
  31. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
  32. How can we know what language models know? Transactions of the Association for Computational Linguistics, 2020.
  33. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  34. Well-tuned simple nets excel on tabular datasets. Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 34, 2021.
  35. Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
  36. Is last layer re-training truly sufficient for robustness to spurious correlations? arXiv preprint arXiv:2308.00473, 2023.
  37. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021.
  38. Transfer learning with deep tabular models. ICLR, 2023.
  39. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023.
  40. When do neural nets outperform boosted trees on tabular data? In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2023.
  41. Ad click prediction: a view from the trenches. In Proceedings of the Annual Conference on Knowledge Discovery and Data Mining (KDD), pages 1222–1230, 2013.
  42. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 2021.
  43. Soybean (Large). UCI Machine Learning Repository, 1988. DOI: https://doi.org/10.24432/C5JG6Z.
  44. Pfns4bo: In-context learning for bayesian optimization. In Proceedings of the International Conference on Machine Learning (ICML), 2023.
  45. Transformers can do bayesian inference. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
  46. Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms. KI-Künstliche Intelligenz, 2018.
  47. Thomas Nagler. Statistical foundations of prior-data fitted networks. In Proceedings of the International Conference on Machine Learning (ICML), 2023.
  48. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  49. Neural oblivious decision ensembles for deep learning on tabular data. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
  50. Catboost: unbiased boosting with categorical features. Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2018.
  51. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web, pages 521–530, 2007.
  52. Revisiting pretraining objectives for tabular deep learning. arXiv preprint arXiv:2207.03208, 2022.
  53. Intra-processing methods for debiasing neural networks. Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
  54. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022.
  55. Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, page 261. American Medical Informatics Association, 1988.
  56. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342, 2021.
  57. Multimodal few-shot learning with frozen language models. Advances in Neural Information Processing Systems, 34:200–212, 2021.
  58. Trust issues: Uncertainty estimation does not enable reliable ood detection on medical tabular data. In Machine Learning for Health, pages 341–354. PMLR, 2020.
  59. Deep learning: A primer for psychologists. Psychological Methods, 2021.
  60. US Bureau of Labor Statistics. National longitudinal surveys of youth data set, 2023.
  61. Openml: Networked science in machine learning. SIGKDD Explorations, 15(2):49–60, 2013.
  62. Openml: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 2014.
  63. A review of feature selection methods based on mutual information. Neural computing and applications, 24:175–186, 2014.
  64. Fairness definitions explained. In Proceedings of the international workshop on software fairness, pages 1–7, 2018.
  65. Universal adversarial triggers for attacking and analyzing nlp. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019.
  66. Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository, 1995. DOI: https://doi.org/10.24432/C5DW2B.
  67. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022.
  68. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  69. Fairness in reciprocal recommendations: A speed-dating study. In Adjunct publication of the 26th conference on user modeling, adaptation and personalization, 2018.
  70. Factual probing is [mask]: Learning vs. learning to recall. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
  71. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  72. Learning to prompt for vision-language models. International Journal of Computer Vision, 2022.
Citations (7)

Summary

We haven't generated a summary for this paper yet.