Neuroevolution of Self-Interpretable Agents (2003.08165v2)

Published 18 Mar 2020 in cs.NE, cs.CV, and cs.LG

Abstract: Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail. Videos of our results and source code available at https://attentionagent.github.io/

View on arXiv

Authors (3)

Yujin Tang (31 papers)
Duong Nguyen (31 papers)
David Ha (30 papers)

Citations (105)

View on Semantic Scholar

Summary

Neuroevolution of Self-Interpretable Agents

The paper "Neuroevolution of Self-Interpretable Agents" by Yujin Tang, Duong Nguyen, and David Ha, investigates the integration of self-attention mechanisms with neuroevolution strategies to develop reinforcement learning (RL) agents exhibiting both efficacy in performance and interpretability in decision-making processes. The overarching theme revolves around addressing the perceptual phenomenon of inattentional blindness, wherein selective attention allows artificial agents to disregard irrelevant details while focusing on task-critical information.

Overview

The paper introduces the concept of utilizing a self-attention bottleneck in vision-based RL tasks by limiting the agent's focus to a select subset of visual inputs. This selectivity enhances interpretability by making the network's focus directly observable in pixel space. This endeavor aligns with indirect encoding strategies which aim to generate expansive implicit weight matrices through compact parameters. The authors demonstrate that these agents can successfully resolve complex vision-centered tasks with drastically reduced parameter counts—reporting a reduction by a factor of 1000x relative to existing methods.

Using neuroevolution to train self-attention architectures offers flexibility, particularly when integrating discrete, non-differentiable modules which may enhance the agent's performance but are typically hindered by gradient-based learning restrictions. This paper evaluates the robustness of this methodology through experiments conducted on challenging environments, notably CarRacing and DoomTakeCover, showcasing competitive results with minimal parameters.

Significance of Results

The empirical findings underscore the agent's efficiency, achieving notable generalized and task-specific results. Notably, with fewer than 4,000 parameters, the developed self-attention agents achieved competitive scores in CarRacing and VizDoom tasks. These outcomes exhibit their capability to generalize across varied environments where non-essential visual elements differ, outperforming baseline deep reinforcement learners in modified scenarios, such as color alterations and object placements that are irrelevant to task goals.

Furthermore, the paper asserts that these agents attend exclusively to pivotal visual cues leading to robust generalization abilities in modified environments. This interpretability and efficiency primarily arise from the indirect encoding of the large model's weights, permitting a confluence of parameterized efficiency and temporal observation consistency.

Implications and Future Directions

The integration of self-attention mechanisms with neuroevolution strengthens the agent's potential to tackle a wider range of tasks beyond the typical constraints of high-dimensional RL problems. This methodology bridges the gap between theoretical Neural Network design and practicality, paving the way for more adaptive and generalizable artificial intelligence models aligned with real-world applications. Furthermore, this approach fuels interest in the revival of indirect encoding methods, suggesting new pathways for efficient and verifiable model structures.

Critically, future developments might focus on refining the attention mechanisms or exploring alternative indirect encoding frameworks to enhance adaptability and resilience, especially concerning substantial changes in task specifics or environmental cues. This direction can further explore the balance between sparse attention control and information-rich perceptions, potentially unlocking new paradigms in AI model training and deployment.

In conclusion, the results from the paper signify impactful advancements against the backdrop of evolving neural architectures for RL agents, offering substantial improvements in parameter efficiency while enhancing interpretability, which remains crucial for safety and security in practical applications.

PDF Markdown

Neuroevolution of Self-Interpretable Agents (2003.08165v2)

Summary

Neuroevolution of Self-Interpretable Agents

Overview

Significance of Results

Implications and Future Directions

Related Papers

GitHub

YouTube