Generalization of Economic Scaling Laws Across Domains and Larger Compute Scales

Determine whether the empirical scaling relationships between the training compute of large language models and human performance outcomes (task completion time, graded quality, and earnings per minute) observed in this randomized controlled trial of LLM-assisted professional translation generalize to other task domains and persist at larger model training compute scales beyond the approximately two orders of magnitude studied here.

Background

The paper presents an online randomized controlled trial with 300 professional translators performing 1,800 tasks, evaluating 13 LLMs spanning just over two orders of magnitude in training compute. It documents empirical scaling relationships linking increased training compute to improvements in task speed, graded quality, and earnings per minute.

However, the experiment focuses on a single professional domain (translation) and short tasks, and the compute range is limited relative to current and projected frontier models. The authors explicitly note uncertainty about whether these empirical scaling laws extend to other domains or hold at larger compute scales, identifying this as a key question requiring further research.

References

Whether these economic scaling laws generalize to other domains and for greater model training compute sizes is a question for further research.

— Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation (2409.02391 - Merali, 4 Sep 2024) in Section 5 (Discussion), final paragraph

Generalization of Economic Scaling Laws Across Domains and Larger Compute Scales

Background

References

Related Problems