Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

This dataset does not exist: training models from generated images (1911.02888v1)

Published 7 Nov 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Current generative networks are increasingly proficient in generating high-resolution realistic images. These generative networks, especially the conditional ones, can potentially become a great tool for providing new image datasets. This naturally brings the question: Can we train a classifier only on the generated data? This potential availability of nearly unlimited amounts of training data challenges standard practices for training machine learning models, which have been crafted across the years for limited and fixed size datasets. In this work we investigate this question and its related challenges. We identify ways to improve significantly the performance over naive training on randomly generated images with regular heuristics. We propose three standalone techniques that can be applied at different stages of the pipeline, i.e., data generation, training on generated data, and deploying on real data. We evaluate our proposed approaches on a subset of the ImageNet dataset and show encouraging results compared to classifiers trained on real images.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Victor Besnier (9 papers)
  2. Himalaya Jain (9 papers)
  3. Andrei Bursuc (55 papers)
  4. Matthieu Cord (129 papers)
  5. Patrick Pérez (90 papers)
Citations (80)