Synthetic Data for Model Selection (2105.00717v2)
Abstract: Recent breakthroughs in synthetic data generation approaches made it possible to produce highly photorealistic images which are hardly distinguishable from real ones. Furthermore, synthetic generation pipelines have the potential to generate an unlimited number of images. The combination of high photorealism and scale turn synthetic data into a promising candidate for improving various ML pipelines. Thus far, a large body of research in this field has focused on using synthetic images for training, by augmenting and enlarging training data. In contrast to using synthetic data for training, in this work we explore whether synthetic data can be beneficial for model selection. Considering the task of image classification, we demonstrate that when data is scarce, synthetic data can be used to replace the held out validation set, thus allowing to train on a larger dataset. We also introduce a novel method to calibrate the synthetic error estimation to fit that of the real domain. We show that such calibration significantly improves the usefulness of synthetic data for model selection.
- A Theory of Learning From Different Domains. Machine learning, 79(1):151–175, 2010.
- This Dataset Does Not Exist: Training Models From Generated Images. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE, 2020.
- Large scale GAN training for high fidelity natural image synthesis. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
- Computing the Testing Error Without a Testing Set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2677–2685, 2020.
- ImageNet: A Large-Scale Hierarchical Image Database. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
- Improved Regularization of Convolutional Neural Networks with Cutout, 2017.
- Diffusion Models Beat GANs on Image Synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Ensembles of GANs for Synthetic Training Data Generation. arXiv preprint arXiv:2104.11797, 2021.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12873–12883, 2021.
- Gastaldi, X. Shake-shake regularization of 3-branch residual networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net, 2017.
- Approximate cross-validation for structured models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Improved training of wasserstein gans. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, 2017.
- Deep pyramidal residual networks. IEEE CVPR, 2017.
- Identity mappings in deep residual networks. In Leibe, B., Matas, J., Sebe, N., and Welling, M. (eds.), Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, volume 9908 of Lecture Notes in Computer Science, pp. 630–645. Springer, 2016a.
- Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016b.
- Gans Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NIPS, 2017.
- Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Squeeze-and-excitation networks, 2019.
- Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- Convolutional networks with dense connectivity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
- Progressive Growing of GANs for Iimproved Quality, Stability, and Variation. arXiv preprint arXiv:1710.10196, 2017.
- A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, pp. 4401–4410, 2019a.
- Analyzing and Improving the Image Quality of StyleGAN. CoRR, abs/1912.04958, 2019b.
- Training Generative Adversarial Networks with Limited Data. arXiv preprint arXiv:2006.06676, 2020.
- Alias-free generative adversarial networks. In Proc. NeurIPS, 2021a.
- Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021b.
- Kohavi, R. et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Ijcai, volume 14, pp. 1137–1145. Montreal, Canada, 1995.
- Learning multiple layers of features from tiny images. 2009.
- Improved precision and recall metric for assessing generative models. arXiv preprint arXiv:1904.06991, 2019.
- Leave zero out: Towards a no-cross-validation approach for model selection. arXiv preprint arXiv:2012.13309, 2020.
- Fast cross-validation. In IJCAI, pp. 2497–2503, 2018.
- Exploring generalization in deep learning. arXiv preprint arXiv:1706.08947, 2017.
- Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
- Visda: A Synthetic-to-Real Benchmark for Visual Domain Adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2021–2026, 2018.
- Classification accuracy score for conditional generative models. arXiv preprint arXiv:1905.10887, 2019.
- Do imagenet classifiers generalize to imagenet? In ICML, 2019.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
- Assessing generative models via precision and recall. arXiv preprint arXiv:1806.00035, 2018.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Wightman, R. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Approximate cross-validation: Guarantees for model assessment and selection. In International Conference on Artificial Intelligence and Statistics, pp. 4530–4540. PMLR, 2020.
- Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431, 2016.
- Exploring Randomly Wired Neural Networks for Image Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- Wide residual networks. In BMVC, 2016.
- Alon Shoshan (7 papers)
- Nadav Bhonker (6 papers)
- Igor Kviatkovsky (7 papers)
- Matan Fintz (1 paper)
- Gerard Medioni (33 papers)