Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

39 2

A Survey of Lottery Ticket Hypothesis (2403.04861v2)

Published 7 Mar 2024 in cs.LG and cs.NE

Abstract: The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a highly sparse subnetwork (i.e., winning tickets) that can achieve even better performance than the original model when trained in isolation. While LTH has been proved both empirically and theoretically in many works, there still are some open issues, such as efficiency and scalability, to be addressed. Also, the lack of open-source frameworks and consensual experimental setting poses a challenge to future research on LTH. We, for the first time, examine previous research and studies on LTH from different perspectives. We also discuss issues in existing works and list potential directions for further exploration. This survey aims to provide an in-depth look at the state of LTH and develop a duly maintained platform to conduct experiments and compare with the most updated baselines.

PDF HTML Abstract

Survey of the Lottery Ticket Hypothesis: Insights and Applications

Introduction to Lottery Ticket Hypothesis (LTH)

The Lottery Ticket Hypothesis (LTH) posits that within large, dense neural network models, there exist smaller, sparse subnetworks—termed "winning tickets"—that can achieve comparable or improved performance relative to the original network when trained in isolation. Pioneered by Frankle and Carbin, the hypothesis challenges conventional perceptions of network pruning and provides a promising direction for enhancing model efficiency. This paper presents a comprehensive survey of LTH, shedding light on its theoretical underpinnings, extension to special models, and key factors influencing winning ticket identification. Furthermore, it explores algorithmic advancements aimed at optimizing LTH's practicality while exploring its intersection with broader subjects such as robustness, fairness, and federated learning.

Theoretical Foundations of LTH

Remarkable strides have been made in providing theoretical evidence supporting LTH's claims. Research demonstrates that given a sufficiently over-parameterized network, there exists a subnetwork capable of replicating the full network's performance. This has been extended to demonstrating the existence of strong lottery tickets—subnetworks that exhibit high performance without the necessity for training. The theoretical exploration also encompasses convolutional neural networks (CNNs) and generalizes to other architectures, such as Transformers and GNNs, providing a robust theoretical basis for LTH across a variety of network architectures.

Special Models: Extending LTH Beyond Conventional Architectures

The application of LTH extends beyond traditional dense networks to specialized models such as Graph Neural Networks (GNNs), Transformers, and Generative Models. Each of these models presents unique challenges and considerations for applying LTH, from addressing graph structure sparsity in GNNs to identifying transferable subnetworks in pre-trained transformers and generative models. The adaptability of LTH to these special cases underscores its broad applicability and potential impact across different domains of AI research.

Key Insights from Experimental Investigations

Empirical studies have elucidated several key insights regarding LTH, such as the extent of pruning feasibly without compromising model accuracy and the role of specific factors like zeros, signs, and the supermask. The concept of early-bird tickets emphasizes the potential for identifying winning tickets early in the training process, significantly reducing computational costs. Furthermore, variations in pruning strategies between layer-wise and global pruning offer nuanced understanding of sparsity distribution and its impact on model performance.

Algorithmic Advancements for LTH

Innovation in algorithms has been pivotal in addressing the practical challenges associated with LTH, particularly regarding efficiency and the cost of iterative retraining. Approaches such as Continuous Sparsification, Dual Lottery Ticket Hypothesis (DLTH), and structured pruning algorithms aim to streamline the process of identifying winning tickets. These advancements not only reduce the computational burden but also enhance the flexibility and applicability of LTH in real-world scenarios.

Intersection with Broader Topics

LTH's implications extend into areas such as model robustness, fairness, federated learning, and reinforcement learning, highlighting its relevance to current challenges in AI safety, ethics, and distributed computing. By exploring the connections between LTH and these subjects, the survey underscores the multifaceted impact of LTH on enhancing model efficiency, security, and equitable AI practices.

Future Directions and Open Issues

Despite its promising prospects, LTH faces open questions and challenges that warrant further exploration. These include accelerating winning tickets in practice, improving theoretical understanding for better network design, extending LTH to emerging models like diffusion models, and more. Addressing these issues will be crucial for realizing LTH's full potential and its application in developing more efficient, robust, and equitable AI systems.

Conclusion

This survey offers a panoramic view of the Lottery Ticket Hypothesis, encapsulating its theoretical foundations, practical algorithms, and broader implications. As LTH continues to evolve and intersect with various facets of AI research, it holds the promise of guiding the future direction of neural network design and optimization, heralding a new era of efficient and powerful AI systems.

PDF Markdown Bookmark Chat (Pro)

References (133)

Authors (9)

Bohan Liu (10 papers)
Zijie Zhang (5 papers)
Peixiong He (1 paper)
Zhensen Wang (1 paper)
Yang Xiao (149 papers)
Ruimeng Ye (3 papers)
Yang Zhou (311 papers)
Wei-Shinn Ku (14 papers)
Bo Hui (15 papers)

Citations (5)

View on Semantic Scholar

Tweets

https://twitter.com/fly51fly/status/1767305418836246660

https://twitter.com/knishimae0531/status/1767511997435023418

YouTube

Show All Videos