Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

80 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

572 121 3

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples (2404.07544v3)

Published 11 Apr 2024 in cs.CL and cs.AI

Abstract: We analyze how well pre-trained LLMs (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several LLMs (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of LLMs scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.

PDF HTML Abstract

Exploring the Capabilities of LLMs in Regression Tasks Without Additional Training

Introduction

LLMs like GPT-4 and Claude 3 have demonstrated a considerable capacity for in-context learning, enabling them to perform tasks with examples presented in their prompt, even without explicit training on these tasks. Various studies have evaluated the in-context learning abilities of LLMs; however, specific insights into their performance on regression tasks are sparse. This paper extends the understanding of LLMs' capabilities by evaluating their performance on both linear and non-linear regression tasks, contrasting their effectiveness with traditional supervised methods and analyzing how their performance scales with the number of in-context exemplars.

Experimental Setup

Datasets

The paper employs a mix of linear, non-linear regression datasets, and regression tasks with non-numerical inputs, utilizing both synthetic datasets and datasets generated from mathematical formulas. These datasets were chosen for their deterministic nature, control over difficulty, and because they present challenges unlikely encountered during the models' training phases. Specifically, synthetic datasets allowed for examining LLMs' ability to discern underlying data patterns and predict future values.

Models

The research involved a comparative analysis across a spectrum of models:

LLMs: A range of LLMs, as exemplified by GPT-4, Claude 3, and others, were tested to evaluate their regression capabilities using in-context examples.
Supervised Models: Traditional supervised models like Random Forest and Gradient Boosting served as baselines, allowing for a comparison of LLMs against conventional regression methods.
Unsupervised Baselines: Simple unsupervised methods, including average and random sampling predictions, were used to contextualize the performance of the LLMs.

Results

Linear Regression Performance

The results revealed that LLMs, when provided with in-context examples, displayed a surprising adeptness at performing linear regression tasks. In some scenarios, models like Claude 3 and GPT-4 not only outperformed unsupervised baselines but also rivaled or exceeded the accuracy of traditional supervised methods such as Linear Regression, Gradient Boosting, and Random Forest. For instance, Claude 3 achieved a lower mean absolute error than Gradient Boosting and Random Forest on a dataset with one informative variable out of three.

Non-Linear Regression Performance

LLMs extend their proficiency to non-linear regression tasks, with Claude 3 and GPT-4 outperforming established supervised methods on benchmarks like the Friedman datasets and newly introduced non-linear datasets designed to minimize prior exposure during training. This suggests a robust capacity for handling complex regression tasks beyond linear models.

Scaling with In-Context Exemplars

An analysis of how LLM performance scales with the number of provided in-context examples revealed a sub-linear regret growth for models such as Claude 3 and GPT-4. This indicates an efficiency in adapting to provided data, approximating the decision quality of the best possible fixed strategy over time.

Implications and Future Directions

The findings highlight the potential of LLMs to perform as robust regression tools, capable of processing both linear and non-linear tasks effectively with merely in-context examples. The paper challenges the conventional separation between traditional machine learning models and LLMs, suggesting that with adequate examples, LLMs can serve as potent tools for regression analysis.

These insights open pathways for integrating LLMs into broader analytical frameworks, where their ability to generalize from examples can complement traditional statistical modeling techniques. Future research may explore refining the in-context learning abilities of LLMs through tailored pre-training or fine-tuning on synthetic datasets or specific regression tasks.

Conclusion

LLMs demonstrate an impressive ability to perform regression tasks when given appropriate in-context examples. This capability extends across both linear and non-linear problems, with performance metrics often competing with or surpassing that of traditional supervised methods. The scalability of LLMs' performance with the number of in-context examples suggests their underlying mechanisms are capable of sophisticated data pattern recognition and extrapolation. As the field progresses, understanding and leveraging the in-context learning abilities of LLMs may provide avenues for enhanced analytical models and applications.

PDF Markdown Bookmark Chat (Pro)

References (59)

Authors (4)

Robert Vacareanu (12 papers)
Vlad-Andrei Negru (1 paper)
Vasile Suciu (1 paper)
Mihai Surdeanu (53 papers)

Citations (16)

View on Semantic Scholar

Tweets

https://twitter.com/IntuitMachine/status/1779459274625703974

https://twitter.com/ajordannafa/status/1858203951180485067

https://twitter.com/robert_nlp/status/1778961629490958811

https://twitter.com/arankomatsuzaki/status/1778608566200397833

https://twitter.com/JorgeGalindo/status/1778721068460470379

https://twitter.com/hiallen72/status/1869113135451521031

YouTube

Show All Videos

HackerNews

Your LLM Is a Capable Regressor When Given In-Context Examples (119 points, 36 comments)
From Words to Numbers:Models Capable of Regression When Given InContext Examples (1 point, 0 comments)
From Words to Numbers: Your Large Language Model Is a Capable Regressor (1 point, 0 comments)