Papers
Topics
Authors
Recent
Search
2000 character limit reached

DARQN: Deep Attention Recurrent Q-Network

Updated 11 May 2026
  • DARQN is a reinforcement learning model that integrates soft and hard attention mechanisms with LSTM-based memory to focus on salient visual features.
  • The architecture combines CNN-based feature extraction with an attention module that produces context vectors, guiding the recurrent Q-value estimation.
  • Empirical results on Atari games demonstrate enhanced training efficiency, reduced parameterization, and improved interpretability compared to traditional DQN models.

The Deep Attention Recurrent Q-Network (DARQN) is an extension of the Deep Q-Network (DQN) architecture that incorporates "soft" and "hard" attention mechanisms within a recurrent reinforcement learning framework. Designed to operate on high-dimensional visual input, DARQN enables agents to dynamically focus on salient regions of an environment, providing both performance gains and interpretable decision-making artifacts. By coupling convolutional feature extraction, differentiable attention, and recurrent memory (LSTM), DARQN introduces a structured filter over input observations, yielding improvements in training efficiency, model compactness, and policy visualization (Sorokin et al., 2015).

1. Architectural Overview

DARQN processes each input time step through a sequence of modules: a convolutional neural network (CNN), an attention mechanism (soft or hard), and a recurrent Q-learning head. The typical flow is as follows:

  • Input: Each raw 84×8484 \times 84 grayscale frame sts_t is passed into a CNN, producing DD feature maps of spatial size m×mm \times m (commonly D=256D=256, m=7m=7).
  • Feature Extraction: The output maps are reshaped into L=m2L = m^2 feature vectors vt={vt1,…,vtL}v_t = \{v_t^1, \dots, v_t^L\} with vti∈RDv_t^i \in \mathbb{R}^D.
  • Attention: The attention module generates a context vector ct∈RDc_t \in \mathbb{R}^D:
    • Soft attention computes a weighted sum sts_t0, where sts_t1 are focus weights.
    • Hard attention samples a single location sts_t2 and sets sts_t3.
  • Glimpse Generation and Recurrence: A fully-connected layer maps sts_t4 to sts_t5, which is processed by an LSTM: sts_t6.
  • Q-Value Estimation: The hidden state sts_t7 is used to predict action values via a linear head: sts_t8, for all sts_t9.

This structure enables the recurrent module to utilize a sequence of attended glimpses, maintaining temporal memory throughout the agent's trajectory.

2. Attention Mechanisms: Soft and Hard

DARQN supports two distinct attention paradigms:

  • Soft Attention: Fully differentiable. Attention weights DD0 are derived by computing unnormalized attention scores:

DD1

and normalizing via softmax:

DD2

DD3 combines spatial features into a context vector passed to the LSTM.

  • Hard Attention: Involves stochastic sampling. At each step, a location DD4 is drawn from a learned policy DD5. The context is set as DD6. The policy parameters DD7 are trained by a REINFORCE-style gradient:

DD8

where DD9 is a learned baseline.

Both approaches reduce the effective dimensionality seen by the recurrent module, enabling more focused memory updates and enhancing interpretability.

3. Training Objective and Gradient Flow

The DARQN agent is trained using the Bellman squared error objective: m×mm \times m0 where target networks provide stable Q-learning targets.

For soft attention, gradients propagate through the attention weights and scores: m×mm \times m1 where m×mm \times m2.

Hard attention uses the REINFORCE estimator and a baseline for variance reduction. The gradients for both attention and LSTM modules are propagated by standard backpropagation through time (BPTT); all CNN, attention, and recurrent weights are updated accordingly.

4. Empirical Performance and Evaluation

DARQN was empirically evaluated on five Atari 2600 games (Breakout, Seaquest, Space Invaders, Tutankham, and Gopher), in direct comparison with the original DQN and the recurrent DRQN baseline. Results demonstrate that despite using approximately half the parameters of DQN, the soft-attention variant equaled or surpassed DQN in three out of five games. Notably:

Game DQN DRQN DARQN (soft)
Seaquest 1,284 1,421 7,263
Space Inv. 916 571 650
Gopher 1,976 3,512 5,356

Examples include Seaquest, where soft-attention DARQN achieved 7,263 versus DQN's 1,284, and Gopher, where DARQN scored 5,356 versus DQN's 1,976 (Sorokin et al., 2015).

A plausible implication is that attentive filtering contributes to greater sample efficiency, reduced parameterization, and improved generalization on certain environments.

5. Interpretability and Visualization

An essential component of DARQN is its ability to provide interpretable attention maps. For each input frame, attention weights (soft) or selection indices (hard) can be overlaid, revealing the agent's spatial focus at each time step. In Breakout, the attention heatmap consistently tracks the ball, while in Seaquest, focus transitions from the oxygen gauge to the submarine, reflecting task-relevant priorities.

The visualizations facilitate online monitoring of agent behavior and support debugging or failure analysis. For instance, the hard-attention variant occasionally fixates on irrelevant sprites and fails to resurface, a behavior clearly diagnosed via attention overlays (Sorokin et al., 2015).

6. Significance and Connections

DARQN injects a differentiable filter gate into the deep Q-learning framework, creating a bridge between attention models and reinforcement learning. By focusing learning and memory resources on task-relevant visual regions, DARQN advances the study of interpretable agents and points toward architectures with lower complexity and improved sample efficiency. The modular attention-LSTM architecture foreshadows trends in integrating differentiable attention across temporal domains, with implications for both reinforcement learning and broader sequence modeling tasks (Sorokin et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Attention Mechanisms (DARQN).