- The paper demonstrates that a 10x increase in LLM compute reduces translation time by 12.3%, validating scaling laws in economic productivity.
- The paper finds that translation quality improves by 0.25 points on a 7-point scale with each 10x compute boost, ensuring speed does not compromise accuracy.
- The paper reveals that enhanced LLM compute leads to a 16.1% increase in earnings per minute, potentially reducing skill-based wage disparities among translators.
 
 
      Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation
Introduction
The paper "Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation" by Ali Merali addresses a crucial question: how does the exponential increase in model training compute for LLMs translate into performance improvements, specifically in economic terms? While existing literature has elucidated the relationship between compute and model perplexity through scaling laws, the ramifications of this on practical and economic dimensions remain underexplored. This paper aims to fill this gap by examining the impact of LLM scaling on professional translators' productivity and quality of work.
Experiment Design
The paper employs a randomized controlled trial (RCT) design, involving 300 professional translators across three languages (Spanish, Hindi, and Arabic). These participants were tasked with translating texts with varying levels of AI assistance. The experiment included a control group and treatment groups where different LLMs with varying model training compute sizes were used. The primary metrics were task completion time, quality of translation, and earnings per minute, thus encapsulating both productivity and quality aspects of their work.
Key Results
Productivity Gains
The paper finds that a 10x increase in model training compute correlates with a 12.3% reduction in task completion time (p=0.001). Given the substantial computational difference between successive generations of GPT models (~70x), this translates into a 22.7% decrease in time per "GPT-jump." This result is statistically significant and robust across various model types and computational sizes.
Quality Improvements
Notably, the quality of translations, measured on a 7-point scale, improved by 0.25 points for every 10x increase in model compute (p=0.000). This improvement equates to a 0.18 standard deviation increase, negating the concern that faster task completion might compromise quality.
Earnings Per Minute
The paper also examines the economic impact of these productivity gains. Translators' earnings per minute, inclusive of bonus payments for high-quality work, increased by 16.1% for every 10x increase in model training compute (p=0.001). For each "GPT-jump," this translates to a 29.7% increase in earnings per minute.
Skill-Based Disparities
The paper performs a heterogeneity analysis, revealing that the gains from LLM scaling are unevenly distributed across different skill levels. Lower-skilled translators experienced a 21.1% reduction in task completion time for every 10x compute increase (p=0.017), compared to a 4.9% reduction for higher-skilled translators. This finding suggests that LLMs may contribute to reducing skill-based wage inequalities.
Implications and Future Directions
The empirical evidence provided suggests that continual advancements in LLMs could lead to significant productivity improvements and potential economic benefits, with pronounced gains for lower-skilled workers. This holds significant implications for labor economics, particularly around the discourse on technological growth and its impact on wage inequalities.
However, this paper is not without limitations. It focuses exclusively on translation tasks and employs a specific range of model training computes. Future research should explore whether these economic scaling laws apply to other professions and broader computational ranges. Additionally, the experimental tasks were relatively short, warranting further studies involving more complex and longer tasks.
Conclusion
This paper provides compelling experimental evidence that scaling LLMs leads to substantial improvements in both productivity and quality in professional translation tasks. The findings underscore the economic potential of future LLM advancements, particularly in reducing skill-based disparities. Nevertheless, further research is needed to generalize these scaling laws across different domains and larger computational scales.