Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers (1906.02773v2)

Published 6 Jun 2019 in stat.ML, cs.LG, and cs.NE

Abstract: The success of lottery ticket initializations (Frankle and Carbin, 2019) suggests that small, sparsified networks can be trained so long as the network is initialized appropriately. Unfortunately, finding these "winning ticket" initializations is computationally expensive. One potential solution is to reuse the same winning tickets across a variety of datasets and optimizers. However, the generality of winning ticket initializations remains unclear. Here, we attempt to answer this question by generating winning tickets for one training configuration (optimizer and dataset) and evaluating their performance on another configuration. Perhaps surprisingly, we found that, within the natural images domain, winning ticket initializations generalized across a variety of datasets, including Fashion MNIST, SVHN, CIFAR-10/100, ImageNet, and Places365, often achieving performance close to that of winning tickets generated on the same dataset. Moreover, winning tickets generated using larger datasets consistently transferred better than those generated using smaller datasets. We also found that winning ticket initializations generalize across optimizers with high performance. These results suggest that winning ticket initializations generated by sufficiently large datasets contain inductive biases generic to neural networks more broadly which improve training across many settings and provide hope for the development of better initialization methods.

Generalization of Lottery Ticket Initializations Across Datasets and Optimizers

Recent investigations into neural network initializations have been propelled by the lottery ticket hypothesis, which posits the existence of sub-networks within over-parameterized models that can achieve similar levels of performance when isolated and trained independently. The paper titled "One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers" explores the feasibility of transferring these winning ticket initializations across different datasets and optimizers. This exploration seeks to enhance our understanding of the generalizability and utility of lottery tickets and their implications on efficient model training.

Summary of Findings

The paper demonstrates through a series of experiments that lottery ticket initializations can indeed generalize across numerous datasets within the natural image domain. Winning ticket initializations derived from larger datasets, such as ImageNet and Places365, consistently showed better performance when transferred to different target datasets compared to those derived from smaller datasets like CIFAR-10. This indicates that larger source datasets may imbue winning tickets with more generic inductive biases pertinent to neural network training.

A notable observation is that the transfer of winning tickets across datasets often resulted in performance closely rivaling, or in some cases exceeding, that of winning tickets specifically generated for the target datasets. This crossover success highlights the potential for a single set of lottery ticket initializations to be reused, thereby reducing computational overhead and the need for repetitive, extensive training cycles.

Additionally, the paper explores optimizer dependency, finding that VGG19 winning tickets generated using varying optimizers (SGD with momentum and Adam) showcased strong generalization capabilities, further solidifying the hypothesis that certain intrinsic biases in lottery tickets can persist beyond specific training configurations.

Implications and Future Directions

The implications of these findings are significant for neural network initialization strategies. The ability of winning ticket initializations to transcend specific dataset and optimizer conditions suggests that we might move toward a more universal set of initializations, which could simplify model training processes and maximize computational resource efficiency.

Despite these promising results, the paper acknowledges several limitations and avenues for further research. Notably, the generalizability of winning tickets outside the domain of natural images and complex multi-modal tasks remains an open question. Additionally, current methods for identifying winning tickets involve computationally demanding iterative pruning, underscoring the necessity for more efficient discovery methods or approximations.

The future of lottery ticket research will likely involve addressing these limitations, exploring domain and task-agnostic winning tickets, and understanding the intrinsic properties making these tickets efficacious. This understanding could pave the way toward more informed model initialization schemas and potentially uncover deeper, universal principles governing neural network training efficiencies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ari S. Morcos (31 papers)
  2. Haonan Yu (29 papers)
  3. Michela Paganini (27 papers)
  4. Yuandong Tian (128 papers)
Citations (217)