Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective? (2108.11018v3)

Published 25 Aug 2021 in cs.LG and cs.CV

Abstract: Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks. The most significant advantage of using synthetic images is that the ground-truth labels are automatically available, enabling unlimited expansion of the data size without human cost. However, synthetic data may have a huge domain gap, in which case increasing the data size does not improve the performance. How can we know that? In this study, we derive a simple scaling law that predicts the performance from the amount of pre-training data. By estimating the parameters of the law, we can judge whether we should increase the data or change the setting of image synthesis. Further, we analyze the theory of transfer learning by considering learning dynamics and confirm that the derived generalization bound is consistent with our empirical findings. We empirically validated our scaling law on various experimental settings of benchmark tasks, model sizes, and complexities of synthetic images.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Hiroaki Mikami (4 papers)
  2. Kenji Fukumizu (89 papers)
  3. Shogo Murai (2 papers)
  4. Shuji Suzuki (10 papers)
  5. Yuta Kikuchi (38 papers)
  6. Taiji Suzuki (119 papers)
  7. Shin-ichi Maeda (29 papers)
  8. Kohei Hayashi (87 papers)
Citations (10)