Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 93 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 30 tok/s Pro
GPT-4o 97 tok/s
GPT OSS 120B 473 tok/s Pro
Kimi K2 228 tok/s Pro
2000 character limit reached

Understanding Attention: In Minds and Machines (2012.02659v1)

Published 4 Dec 2020 in cs.AI, cs.LG, and cs.NE

Abstract: Attention is a complex and broad concept, studied across multiple disciplines spanning artificial intelligence, cognitive science, psychology, neuroscience, and related fields. Although many of the ideas regarding attention do not significantly overlap among these fields, there is a common theme of adaptive control of limited resources. In this work, we review the concept and variants of attention in artificial neural networks (ANNs). We also discuss the origin of attention from the neuroscience point of view parallel to that of ANNs. Instead of having seemingly disconnected dialogues between varied disciplines, we suggest grounding the ideas on common conceptual frameworks for a systematic analysis of attention and towards possible unification of ideas in AI and Neuroscience.

Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper establishes a unified framework bridging attention mechanisms in AI and neuroscience using Marr's Levels of Analysis.
  • It details diverse attention variants in neural networks, including soft versus hard and local versus global attention, highlighting their computational benefits.
  • It links brain-based attention to language translation processes, aligning neural computation with declarative and procedural memory models.

Understanding the Role of Attention in Neural Networks and the Human Brain

Introduction

Attention mechanisms have become a cornerstone in the development of modern artificial neural networks (ANNs), particularly in tasks like machine translation, question answering, and image captioning. Interestingly, attention is not a new concept; it has also been a key focus in the fields of neuroscience and psychology. This paper explores the concept of attention, exploring how it is understood and implemented in both ANNs and the human brain. It also advocates for a more unified approach to studying this fascinating topic across these varied disciplines.

Attention in Artificial Neural Networks

Traditional Encoder-Decoder for Language Translation

Before attention mechanisms were introduced, the encoder-decoder framework was popular for tasks like language translation. Here’s a brief overview:

  1. Encoder: Reads the input sequence and creates a fixed representation.
  2. Decoder: Generates the output sequence from this representation.

While this setup works, it struggles with longer sentences because it tries to squash all relevant information into a single context vector.

Basic Attention Mechanism

The game changed when the attention mechanism was proposed by Bahdanau et al. Instead of relying on a single fixed context vector, the attention mechanism generates a distinct context for each target word dynamically. This context is computed as a weighted sum of the encoder's hidden states, where the weights represent the relevance of each hidden state to the current target word.

Key Equations
  • Alignment Model: Scores the relevance of each input position to each output position.
    1
    
    e_ij = a(s_{i-1}, h_j)
  • Normalized Scores: These scores are normalized to create attention weights.
    1
    
    α_ij = exp(e_ij) / Σ_k exp(e_ik)
  • Context Vector: The context vector for the i-th output token.
    1
    
    c_i = Σ_j α_ij * h_j

Attention Variants

Attention mechanisms have evolved into several variants:

  1. Soft vs Hard Attention: Soft attention distributes weights over the entire input, whereas hard attention focuses on a single part.
  2. Local vs Global Attention: Local attention restricts the focus to a smaller context window, while global attention considers the entire input.
  3. Self-Attention: Self-attention relates different positions of the same input sequence to learn dependencies.
  4. Hierarchical Attention: Uses a two-level attention mechanism to handle hierarchical data structures, useful in document classification and summarization.

Language Translation in the Human Brain

Studies in early bilinguals show that the brain processes sentence translations by engaging in greater semantic and syntactic processing, aligning well with the idea of context vectors in attention mechanisms. The Declarative/Procedural (DP) Model in neuroscience aligns closely with how ANNs handle attention and language translation.

  • Declarative Memory: Stores facts and knowledge, similar to the context vector learning words and concepts.
  • Procedural Memory: Handles skill learning, much like the sequence-to-sequence model in ANNs learning syntax.

Attention in Neuroscience

Overview of Attention

In neuroscience, the brain’s attention mechanism can be thought of as the adaptive control of limited resources. William James famously said, “Everyone knows what attention is,” but formalizing it scientifically has proven to be challenging.

Origin of Attention in the Brain

Attention mechanisms in the brain are complex and primarily controlled by the brainstem. However, their effects are widespread, influencing various cortical regions involved in sensory processing, emotional regulation, and executive functions.

Types of Attention

  1. External vs. Internal Attention
    • External Attention: Focuses on processing incoming sensory information.
    • Internal Attention: Involves managing internal resources such as long-term and working memory.
  2. Spatial vs. Feature Attention
    • Spatial Attention: Allocates resources to specific locations in the sensory space.
    • Feature Attention: Focuses on specific attributes like color or pitch, regardless of their spatial location.

Towards Unifying Attention in AI and Neuroscience

Despite the differences between biological and artificial neural networks, both aim to efficiently manage computational resources. The paper suggests using Marr's Levels of Analysis to create a unified conceptual framework for studying attention in both fields.

Marr’s Levels of Analysis

  1. Computational Level: Defines the goal (e.g., efficiently determining salient input signals).
  2. Representation Level: Proposes possible representations and algorithms (e.g., differentiable programming, memory-augmented networks).
  3. Implementation Level: Discusses actual implementations, whether in biological neural circuits or artificial hardware like GPUs and TPUs.

Conclusion

This paper offers valuable insights into how attention mechanisms work in both artificial and biological systems. By grounding research in common conceptual frameworks, it encourages a more collaborative and systematic approach to understanding attention across disciplines. Adopting such frameworks could help bridge gaps and highlight potential areas for cross-disciplinary innovation.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube