Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP (1906.02768v3)

Published 6 Jun 2019 in stat.ML, cs.AI, cs.LG, and cs.NE

Abstract: The lottery ticket hypothesis proposes that over-parameterization of deep neural networks (DNNs) aids training by increasing the probability of a "lucky" sub-network initialization being present rather than by helping the optimization process (Frankle & Carbin, 2019). Intriguingly, this phenomenon suggests that initialization strategies for DNNs can be improved substantially, but the lottery ticket hypothesis has only previously been tested in the context of supervised learning for natural image tasks. Here, we evaluate whether "winning ticket" initializations exist in two different domains: NLP and reinforcement learning (RL).For NLP, we examined both recurrent LSTM models and large-scale Transformer models (Vaswani et al., 2017). For RL, we analyzed a number of discrete-action space tasks, including both classic control and pixel control. Consistent with workin supervised image classification, we confirm that winning ticket initializations generally outperform parameter-matched random initializations, even at extreme pruning rates for both NLP and RL. Notably, we are able to find winning ticket initializations for Transformers which enable models one-third the size to achieve nearly equivalent performance. Together, these results suggest that the lottery ticket hypothesis is not restricted to supervised learning of natural images, but rather represents a broader phenomenon in DNNs.

PDF Abstract

An Exploration of the Lottery Ticket Hypothesis in Multiple Domains

The paper "Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP" presents a comprehensive investigation into the applicability of the Lottery Ticket Hypothesis (LTH) beyond its initial confines of supervised learning on natural image tasks. This paper extends LTH's purview to NLP and Reinforcement Learning (RL), probing its viability across vastly different neural network architectures and learning paradigms.

Overview

The Lottery Ticket Hypothesis posits that in highly over-parameterized neural networks, there exist smaller, sparse subnetworks or "winning tickets" that, if isolated and trained from the start, can match or surpass the performance of the full model in terms of accuracy and efficiency. This concept, as initially demonstrated in supervised learning tasks, has revolutionary implications for network pruning and efficient model training. However, its domain-general nature and applicability to NLP and RL were unexplored before this paper.

Key Findings and Methodology

In their experiments, the authors focus on two distinct machine learning realms:

NLP: The paper evaluates the hypothesis on LLMing using LSTMs and machine translation tasks using Transformers. Notably, for NLP tasks like Wikitext-2 LLMing and WMT'14 English-German translation, winning tickets consistently outperformed random initializations. This finding was evident both in recurrent LSTMs and in complex Transformer architectures, with Transformers achieving comparable performance at reduced scale—particularly within the Transformer Big model trained on machine translation tasks. The researchers reported BLEU scores maintaining 99% of the baseline performance with significantly fewer parameters, thereby showcasing practical implications of the hypothesis in scaling down models without severe performance penalties.
Reinforcement Learning (RL): The exploration extended to various RL environments, including classic control tasks and more elaborate Atari games. Here, the results varied, indicating the presence of winning tickets in several environments such as CartPole and Seaquest, whereas others like Krull demonstrated a surprising robustness to parameter pruning, hinting at substantial over-parameterization. The variable outcomes in Atari games suggest a nuanced interaction between network architecture, task complexity, and the effectiveness of winning tickets, underlining the need for further investigations in more diverse RL settings.

Implications and Speculative Insights

The evidence presented demonstrates that the lottery ticket phenomenon is not confined to image-based supervised learning but rather manifests as a broader characteristic of deep neural network training across disciplines. The identification of winning tickets in diverse domains such as NLP and RL could lead to significant enhancements in model training regimens, encompassing efficiency improvements, faster convergence rates, and resource maximization, particularly pertinent in resource-intensive training scenarios like those involving large-scale Transformers.

The paper also highlights how iterative pruning and late rewinding—techniques critical in the initial LTH demonstrations—are vital to harnessing the full potential of winning tickets in both NLP and RL environments. Experiments emphasize that the alignment of these methodologies with task-specific models is crucial to realizing their most profound benefits.

Conclusion

The research presented confirms the far-reaching utility of the Lottery Ticket Hypothesis beyond previous applications. By demonstrating the hypothesis's validity across NLP and RL, the paper adds a valuable perspective to contemporary understanding of neural network initialization and optimization. It underscores the ongoing need for adaptive pruning strategies and points to future developments, particularly in the deployment of sparse sub-net models, fostering advancements in efficient, scalable deep learning practices across varied computational tasks. This research not only furthers academic understanding but also bears crucial implications for the future of industrial model deployment and energy-efficient AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Haonan Yu (29 papers)
Sergey Edunov (26 papers)
Yuandong Tian (128 papers)
Ari S. Morcos (31 papers)

Citations (141)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos