Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation (2409.02391v2)

Published 4 Sep 2024 in econ.GN, cs.AI, and q-fin.EC

Abstract: This paper derives "scaling laws"--empirical relationships between the training compute of LLMs and their performance--for economic outcomes. In a preregistered online experiment, 300 professional translators completed 1,800 tasks using one of 13 LLMs (or a control). A tenfold increase in model compute improved task completion speed by 12.3%, grades by 0.18 standard deviations, and earnings per minute by 16.1%. Gains were four times larger for lower-skilled workers. These findings suggest continued model scaling could boost U.S. productivity by at least 6.9% over the next decade.

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that a 10x increase in LLM compute reduces translation time by 12.3%, validating scaling laws in economic productivity.
The paper finds that translation quality improves by 0.25 points on a 7-point scale with each 10x compute boost, ensuring speed does not compromise accuracy.
The paper reveals that enhanced LLM compute leads to a 16.1% increase in earnings per minute, potentially reducing skill-based wage disparities among translators.

Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation

Introduction

The paper "Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation" by Ali Merali addresses a crucial question: how does the exponential increase in model training compute for LLMs translate into performance improvements, specifically in economic terms? While existing literature has elucidated the relationship between compute and model perplexity through scaling laws, the ramifications of this on practical and economic dimensions remain underexplored. This paper aims to fill this gap by examining the impact of LLM scaling on professional translators' productivity and quality of work.

Experiment Design

The paper employs a randomized controlled trial (RCT) design, involving 300 professional translators across three languages (Spanish, Hindi, and Arabic). These participants were tasked with translating texts with varying levels of AI assistance. The experiment included a control group and treatment groups where different LLMs with varying model training compute sizes were used. The primary metrics were task completion time, quality of translation, and earnings per minute, thus encapsulating both productivity and quality aspects of their work.

Key Results

Productivity Gains

The paper finds that a 10x increase in model training compute correlates with a 12.3% reduction in task completion time (p=0.001). Given the substantial computational difference between successive generations of GPT models (~70x), this translates into a 22.7% decrease in time per "GPT-jump." This result is statistically significant and robust across various model types and computational sizes.

Quality Improvements

Notably, the quality of translations, measured on a 7-point scale, improved by 0.25 points for every 10x increase in model compute (p=0.000). This improvement equates to a 0.18 standard deviation increase, negating the concern that faster task completion might compromise quality.

Earnings Per Minute

The paper also examines the economic impact of these productivity gains. Translators' earnings per minute, inclusive of bonus payments for high-quality work, increased by 16.1% for every 10x increase in model training compute (p=0.001). For each "GPT-jump," this translates to a 29.7% increase in earnings per minute.

Skill-Based Disparities

The paper performs a heterogeneity analysis, revealing that the gains from LLM scaling are unevenly distributed across different skill levels. Lower-skilled translators experienced a 21.1% reduction in task completion time for every 10x compute increase (p=0.017), compared to a 4.9% reduction for higher-skilled translators. This finding suggests that LLMs may contribute to reducing skill-based wage inequalities.

Implications and Future Directions

The empirical evidence provided suggests that continual advancements in LLMs could lead to significant productivity improvements and potential economic benefits, with pronounced gains for lower-skilled workers. This holds significant implications for labor economics, particularly around the discourse on technological growth and its impact on wage inequalities.

However, this paper is not without limitations. It focuses exclusively on translation tasks and employs a specific range of model training computes. Future research should explore whether these economic scaling laws apply to other professions and broader computational ranges. Additionally, the experimental tasks were relatively short, warranting further studies involving more complex and longer tasks.

Conclusion

This paper provides compelling experimental evidence that scaling LLMs leads to substantial improvements in both productivity and quality in professional translation tasks. The findings underscore the economic potential of future LLM advancements, particularly in reducing skill-based disparities. Nevertheless, further research is needed to generalize these scaling laws across different domains and larger computational scales.