Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 54 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 127 tok/s Pro
Kimi K2 216 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

SMI: An Information-Theoretic Metric for Predicting Model Knowledge Solely from Pre-Training Signals (2502.04066v3)

Published 6 Feb 2025 in cs.CL and cs.AI

Abstract: The GPT-4 technical report highlights the possibility of predicting model performance on downstream tasks using only pre-training signals, though detailed methodologies are absent. Such predictive capabilities are essential for resource-efficient pre-training and the construction of task-aligned datasets. In this paper, we aim to predict performance in closed-book question answering (QA), a vital downstream task indicative of a model's internal knowledge. We address three primary challenges: (1) limited access to and understanding of pre-training corpora, (2) limitations of current evaluation methods for pre-trained models, and (3) limitations of frequency-based metrics in predicting model performance. In response to these challenges, we conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained LLMs. Subsequently, we develop a multi-template QA evaluation framework incorporating paraphrased question variants. Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics, model size, and QA accuracy, without requiring any additional training. The experimental results demonstrate that SMI outperforms co-occurrence-based baselines, achieving $R2$ > 0.75 on models with over one billion parameters. Theoretical analysis further reveals the marginal benefits of scaling model size and optimizing data, indicating that the upper limit of specific QA task accuracy is approximately 80%. Our project is available at https://github.com/yuhui1038/SMI.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.