Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space (2203.14680v3)

Published 28 Mar 2022 in cs.CL

Abstract: Transformer-based LLMs (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process, by reverse-engineering the operation of the feed-forward network (FFN) layers, one of the building blocks of transformer models. We view the token representation as a changing distribution over the vocabulary, and the output from each FFN layer as an additive update to that distribution. Then, we analyze the FFN updates in the vocabulary space, showing that each update can be decomposed to sub-updates corresponding to single FFN parameter vectors, each promoting concepts that are often human-interpretable. We then leverage these findings for controlling LM predictions, where we reduce the toxicity of GPT2 by almost 50%, and for improving computation efficiency with a simple early exit rule, saving 20% of computation on average.

PDF Abstract

Analysis of Transformer Feed-Forward Layers and Vocabulary Space Predictions

The paper "Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space" seeks to dissect the prediction formation mechanisms within transformer-based LLMs (LMs), specifically focusing on the feed-forward network (FFN) layers. Transformers have become the backbone of modern NLP, yet their internal operations, especially how they construct predictions, remain obscured and complex. This research aims to shed light on the role of FFN layers, framed here as central and active agents in facilitating model predictions.

The authors conceptualize the token representation at any point in the model as a mutable distribution over the vocabulary. Each FFN layer refines this distribution through an additive mechanism that modifies the vocabulary space. This process is examined via a decomposition approach where FFN layer outputs are broken down into individual components or sub-updates, corresponding to specific FFN parameter vectors. These sub-updates are often aligned with interpretable concepts within the vocabulary.

A salient finding of this research is the demonstration that FFN sub-updates frequently encode human-understandable concepts such as "breakfast" or "pronouns." The researchers present empirical evidence that underscore these interpretations, revealing the nuanced ways in which FFN layers distribute model attention and influence predictions. Through strategic manipulation, such as increasing selected non-toxic sub-update weights, nearly a 50% reduction in GPT2's toxic language output was achieved, showcasing a practical consequence of this granular understanding.

Moreover, the paper proposes economical model improvements through early exit strategies. By predicting when the model has reached an actionable decision point early on, computation costs can be reduced by approximately 20%. The efficient use of computational resources is critical given the ever-growing scale of models and their real-world applications.

In essence, this work provides a detailed analysis of FFN operations in transformer-based LMs, asserting their significance in shaping prediction outputs by promoting specific vocabulary concepts. Such insights enable more informed interventions for refining model behavior, offering practical levers for reducing undesirable outputs and enhancing computation efficiency.

The paper's implications extend to both the theoretical comprehension of transformer architectures as well as practical implementations involving AI safety and resource efficiency. The proposal to dissect transformer predictions at the level of FFN sub-updates invites future research avenues aimed at achieving finer model control without compromising performance. This research thus represents a meaningful contribution toward demystifying transformer LMs and capitalizing on their inherent capabilities more judiciously.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Mor Geva (58 papers)
Avi Caciularu (46 papers)
Kevin Ro Wang (1 paper)
Yoav Goldberg (142 papers)

Citations (271)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos