The Information of Large Language Model Geometry (2402.03471v1)

Published 1 Feb 2024 in cs.LG, cs.AI, cs.CL, cs.IT, and math.IT

Abstract: This paper investigates the information encoded in the embeddings of LLMs. We conduct simulations to analyze the representation entropy and discover a power law relationship with model sizes. Building upon this observation, we propose a theory based on (conditional) entropy to elucidate the scaling law phenomenon. Furthermore, we delve into the auto-regressive structure of LLMs and examine the relationship between the last token and previous context tokens using information theory and regression techniques. Specifically, we establish a theoretical connection between the information gain of new tokens and ridge regression. Additionally, we explore the effectiveness of Lasso regression in selecting meaningful tokens, which sometimes outperforms the closely related attention weights. Finally, we conduct controlled experiments, and find that information is distributed across tokens, rather than being concentrated in specific "meaningful" tokens alone.

References (26)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/0xkidwai/status/1755385646925963370

The Information of Large Language Model Geometry (2402.03471v1)

Summary

Related Papers

Tweets