Papers
Topics
Authors
Recent
2000 character limit reached

Explaining Context Length Scaling and Bounds for Language Models (2502.01481v3)

Published 3 Feb 2025 in cs.LG and cs.CL

Abstract: Long Context LLMs have drawn great attention in the past few years. There has been work discussing the impact of long context on LLM performance: some find that long irrelevant context could harm performance, while some experimentally summarize loss reduction by relevant long context as Scaling Laws. This calls for a more thorough understanding on how long context impacts Language Modeling. In this work, we (1) propose a clean and effective theoretical framework for explaining the impact of context length on Language Modeling, from an Intrinsic Space perspective; and (2) conduct experiments on natural language and synthetic data, validating our proposed theoretical assumptions and deductions. Our theoretical framework can provide practical insights such as establishing that training dataset size dictates an optimal context length and bounds context length scaling for certain cases. We hope our work may inspire new long context LLMs, as well as future work studying Physics for LLMs. Code for our experiments is available at: https://github.com/JingzheShi/NLPCtlScalingAndBounds.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.