On the Equivalency, Substitutability, and Flexibility of Synthetic Data (2403.16244v1)
Abstract: We study, from an empirical standpoint, the efficacy of synthetic data in real-world scenarios. Leveraging synthetic data for training perception models has become a key strategy embraced by the community due to its efficiency, scalability, perfect annotations, and low costs. Despite proven advantages, few studies put their stress on how to efficiently generate synthetic datasets to solve real-world problems and to what extent synthetic data can reduce the effort for real-world data collection. To answer the questions, we systematically investigate several interesting properties of synthetic data -- the equivalency of synthetic data to real-world data, the substitutability of synthetic data for real data, and the flexibility of synthetic data generators to close up domain gaps. Leveraging the M3Act synthetic data generator, we conduct experiments on DanceTrack and MOT17. Our results suggest that synthetic data not only enhances model performance but also demonstrates substitutability for real data, with 60% to 80% replacement without performance loss. In addition, our study of the impact of synthetic data distributions on downstream performance reveals the importance of flexible data generators in narrowing domain gaps for improved model adaptability.
- Hspace: Synthetic parametric humans animated in complex environments. arXiv preprint arXiv:2112.12867, 2021.
- Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP), pages 3464–3468. IEEE, 2016.
- Bedlam: A synthetic dataset of bodies exhibiting detailed lifelike animated motion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8726–8737, 2023.
- Unity perception: Generate synthetic data for computer vision. arXiv preprint arXiv:2107.04259, 2021.
- Video generation models as world simulators. 2024.
- Playing for 3d human recovery. arXiv preprint arXiv:2110.07588, 2021.
- Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9686–9696, 2023.
- Che-Jui Chang. Transfer learning from monolingual asr to transcription-free cross-lingual voice conversion. arXiv preprint arXiv:2009.14668, 2020.
- Acoustic anomaly detection using multilayer neural networks and semantic pointers. Journal of Information Science & Engineering, 37(1), 2021.
- The ivi lab entry to the genea challenge 2022–a tacotron2 based method for co-speech gesture generation with locality-constraint attention mechanism. In Proceedings of the 2022 International Conference on Multimodal Interaction, pages 784–789, 2022a.
- Disentangling audio content and emotion with adaptive instance normalization for expressive facial animation synthesis. Computer Animation and Virtual Worlds, 33(3-4):e2076, 2022b.
- Learning from synthetic human group activities, 2023a.
- The importance of multimodal emotion conditioning and affect consistency for embodied conversational agents. In Proceedings of the 28th International Conference on Intelligent User Interfaces, pages 790–801, 2023b.
- Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003, 2020.
- Peoplesanspeople: a synthetic data generator for human-centric computer vision. arXiv preprint arXiv:2112.09290, 2021.
- MeMOTR: Long-term memory-augmented transformer for multi-object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9901–9910, 2023.
- Actor-transformers for group activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 839–848, 2020.
- Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
- Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831, 2016.
- Pacmo: Partner dependent human motion generation in dyadic human activity using neural operators. arXiv preprint arXiv:2211.16210, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20993–21002, 2022.
- Learning from synthetic humans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 109–117, 2017.
- Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP), pages 3645–3649. IEEE, 2017.
- Bridging the gap between end-to-end and non-end-to-end multi-object tracking, 2023.
- Synbody: Synthetic dataset with layered human models for 3d human perception and modeling. arXiv preprint arXiv:2303.17368, 2023.
- Motr: End-to-end multiple-object tracking with transformer. In European Conference on Computer Vision, pages 659–675. Springer, 2022.
- Motrv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22056–22065, 2023.
- Composer: Compositional reasoning of group activity in videos with keypoint-only modality. Proceedings of the 17th European Conference on Computer Vision (ECCV 2022), 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.