Predicting Emergent Abilities with Infinite Resolution Evaluation (2310.03262v3)

Published 5 Oct 2023 in cs.CL

Abstract: The scientific scale-up of LLMs necessitates a comprehensive understanding of their scaling properties. However, the existing literature on the scaling properties only yields an incomplete answer: optimization loss decreases predictably as the model size increases, in line with established scaling law; yet no scaling law for task has been established and the task performances are far from predictable during scaling. Task performances typically show minor gains on small models until they improve dramatically once models exceed a size threshold, exemplifying the emergent abilities''. In this study, we discover that small models, although they exhibit minor performance, demonstrate critical and consistent task performance improvements that are not captured by conventional evaluation strategies due to insufficient measurement resolution. To measure such improvements, we introduce PassUntil, an evaluation strategy with theoretically infinite resolution, through massive sampling in the decoding phase. With PassUntil, we conduct a quantitative investigation into the scaling law of task performance. The investigation contains two parts. Firstly, a strict task scaling law that is not conventionally known to exist, is identified, enhancing the predictability of task performances. Remarkably, we are able to predict the performance of the 2.4B model on code generation with merely 0.05\% deviation before training starts, which is the first systematic attempt to verify predictable scaling proposed by GPT-4's report. Secondly, we are able to study emergent abilities quantitatively. We identify a kind of accelerated emergence whose scaling curve cannot be fitted by standard scaling law function and has a increasing speed. We then examine two hypothesis and imply that themultiple circuits hypothesis'' might be responsible for the accelerated emergence.

PDF Abstract

An Analysis of Predicting Emergent Abilities with Infinite Resolution Evaluation

The paper "Predicting Emergent Abilities with Infinite Resolution Evaluation" addresses the complex issue of understanding and predicting the scaling properties of LLMs. As LLMs play an increasingly pivotal role in AI research, their development requires a comprehensive understanding of how their performance scales with size and data input. Traditional scaling laws mainly focus on model loss, yet they prove inadequate for predicting task performance, giving rise to the challenge of predicting emergent abilities in LLMs.

The authors introduce a novel evaluation technique, PassUntil, designed to measure the nuanced improvements in task performance with theoretically infinite resolution. This strategy involves massive sampling during the model's decoding phase to capture subtle, emerging improvements in task performance that conventional methods miss. PassUntil allows for the quantitative exploration of a scaling law governing task performance, thus enhancing the predictability of said performance.

A major contribution of this paper is the establishment of a previously unrecognized strict task scaling law. This law facilitates accurate predictions of task performance at the outset of training. The paper provides compelling evidence, as shown by the minimal deviation (0.05\%) in task performance predictions on a 2.4B parameter model for code generation before training even commences. This represents a significant leap in methodology and supports hypotheses around the predictability of scaling, aligning with similar observations in the GPT-4 report.

Moreover, the paper identifies the phenomenon of accelerated emergence, which displays a non-linear scaling curve that deviates from standard scaling law functions. The authors propose the "multiple circuits hypothesis" as a possible explanation for this phenomenon, suggesting that emergent capabilities might be orchestrated by multiple underlying circuits within the model. They support this hypothesis using quantitative analysis, excluding alternative explanations like multi-step reasoning.

The implications of this research are significant, both practically and theoretically. For practitioners, the introduction of the PassUntil strategy may allow for more targeted model development, improving resource allocation and reducing costs associated with training models that may not yield proportional performance improvements. Theoretically, the findings challenge existing notions of emergent abilities in LLMs by providing a method to predict these seemingly unpredictable behaviors.

The work builds upon the scientific understanding of LLMs and opens new pathways for exploring the scaling laws of AI technologies. Future developments in AI may leverage this understanding to further refine model predictability, potentially leading to more robust and reliable AI systems.

While the paper offers a clear advancement in the field, the generalization of its findings across different model architectures and domains remains to be explored. Additionally, the paper hints at the need for deeper exploration into the mechanisms underlying emergent abilities, as our current explanation is predominantly empirical.

In conclusion, "Predicting Emergent Abilities with Infinite Resolution Evaluation" provides a cornerstone for future research in LLM scalability, offering methodological innovations and a theoretical framework that could have profound implications for AI research and application. Further research should aim to refine these tools and expand their applicability across various types of LLMs and tasks, which could eventually lead to a more predictable and controlled expansion of AI capabilities.