Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 69 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 209 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

SynGen-Vision: Synthetic Data Generation for training industrial vision models (2509.04894v1)

Published 5 Sep 2025 in cs.CV and cs.LG

Abstract: We propose an approach to generate synthetic data to train computer vision (CV) models for industrial wear and tear detection. Wear and tear detection is an important CV problem for predictive maintenance tasks in any industry. However, data curation for training such models is expensive and time-consuming due to the unavailability of datasets for different wear and tear scenarios. Our approach employs a vision LLM along with a 3D simulation and rendering engine to generate synthetic data for varying rust conditions. We evaluate our approach by training a CV model for rust detection using the generated dataset and tested the trained model on real images of rusted industrial objects. The model trained with the synthetic data generated by our approach, outperforms the other approaches with a mAP50 score of 0.87. The approach is customizable and can be easily extended to other industrial wear and tear detection scenarios

Summary

  • The paper introduces SynGen-Vision, a pipeline that integrates generative AI, style transfer, and noise removal to produce synthetic, photorealistic data for industrial rust detection.
  • It leverages Stable Diffusion-based text prompts, 3D rendering with Blender, and automated annotation to efficiently generate datasets for training YOLOv5 models.
  • Empirical results demonstrate substantial improvement (mAP50 up to 0.87) over baseline methods, highlighting robust generalization to real-world scenarios.

SynGen-Vision: Synthetic Data Generation for Training Industrial Vision Models

Introduction

The paper presents SynGen-Vision, an end-to-end pipeline for generating synthetic data to train computer vision (CV) models for industrial wear and tear detection, with a focus on rust detection. The motivation stems from the scarcity and high cost of acquiring and annotating real-world datasets for such tasks, especially for rare or progressive conditions like varying degrees of rust. The proposed approach leverages generative AI (GenAI) models, 3D simulation, and rendering engines to create photorealistic, annotated datasets that can be used to train object detection models. The pipeline is designed to be extensible to other industrial wear and tear scenarios beyond rust.

Methodology

Texture Generation from User Prompts

The pipeline begins with user prompts describing the desired wear and tear conditions. Stable Diffusion, a latent diffusion model, is used for text-to-image generation, producing texture images corresponding to different rust conditions. The authors note that prompt engineering is critical: appending terms like "texture" or "surface" yields cleaner, more usable outputs, while modifiers such as "complete," "streaks," or "spots" control the degree and pattern of rust.

Texture Synthesis with Style Transfer

Direct application of GenAI-generated textures to 3D models often results in loss of important object details (e.g., patterns, symbols). To address this, the pipeline incorporates a style transfer step, combining the original object texture (content) with the generated rust texture (style). This preserves the semantic and structural details of the base object while imparting the desired wear characteristics, resulting in more realistic and domain-appropriate textures.

Intelligent Texture Application and 3D Scene Generation

Noisy or unusable textures (e.g., those with watermarks, random text, or incorrect rust patterns) are filtered using image processing techniques. The selected textures are mapped onto 3D models via UV mapping in Blender. The models are then placed in 3D scenes, and multiple images are rendered from varying camera angles, lighting conditions, and distances to simulate real-world variability.

Synthetic Data Annotation

Rendered images are automatically annotated with bounding boxes and class labels corresponding to the rust condition, leveraging Blender's scripting capabilities. The annotation files include image paths, class labels, and bounding box coordinates, formatted for compatibility with standard object detection frameworks.

Model Training and Evaluation

A YOLOv5 object detection model is trained exclusively on the synthetically generated dataset (2000 samples, three classes: complete rust, rust streaks, no rust). Evaluation is performed on a manually annotated set of 100 real-world images. The pipeline is benchmarked against three synthetic data generation strategies: (a) GenAI only, (b) GenAI + Style Transfer, and (c) GenAI + Style Transfer + Noise Removal.

Results

The quantitative evaluation demonstrates a clear progression in performance across the three approaches. The final pipeline (GenAI + Style Transfer + Noise Removal) achieves a mean average precision at IoU 0.5 (mAP50) of 0.87 on real images, with high precision and recall across all classes. This is a substantial improvement over the GenAI-only baseline (mAP50 = 0.28) and the intermediate style transfer approach (mAP50 = 0.45). The results indicate that style transfer and noise filtering are essential for generating high-quality, domain-relevant synthetic data that generalizes to real-world conditions.

Key numerical results:

Approach mAP50 (all classes)
GenAI only 0.28
GenAI + Style Transfer 0.45
GenAI + Style Transfer + Noise Removal 0.87

The model trained on synthetic data generalizes well to real images, including objects of different shapes, supporting the claim that the pipeline produces robust and transferable features.

Implications and Discussion

The SynGen-Vision pipeline addresses a critical bottleneck in industrial CV applications: the lack of annotated data for rare or progressive wear conditions. By combining GenAI, style transfer, and 3D rendering, the approach enables rapid, low-cost generation of large, diverse, and accurately labeled datasets. The pipeline's modularity and reliance on prompt-based control make it adaptable to a wide range of industrial inspection tasks beyond rust detection, such as crack detection, corrosion, or other forms of material degradation.

The strong empirical results—particularly the high mAP50 on real data—underscore the viability of synthetic data as a substitute for real-world data in industrial settings, provided that domain-specific realism is maintained through style transfer and noise filtering. The pipeline's use of open-source tools (Stable Diffusion, Blender, YOLOv5) further enhances its accessibility and reproducibility.

Potential limitations include the dependence on the quality of 3D models and the representativeness of the synthetic scenes. The approach may require further adaptation for highly complex or non-rigid objects, or for wear types with subtle visual cues. Additionally, while the pipeline automates annotation, the accuracy of bounding boxes and class labels is contingent on the fidelity of the 3D rendering and scene setup.

Future Directions

The authors suggest several avenues for future work:

  • Extending the pipeline to simulate and annotate other forms of wear and tear, including cracks, dents, or multi-modal degradation.
  • Incorporating more advanced generative models as they become available, to further improve texture realism and diversity.
  • Enhancing the annotation process to support instance segmentation or pixel-wise labeling for more granular tasks.
  • Integrating domain adaptation techniques to further bridge the gap between synthetic and real data distributions.
  • Scaling the pipeline for large-scale industrial deployment, including automated 3D model acquisition and scene generation.

Conclusion

SynGen-Vision demonstrates that a carefully designed synthetic data generation pipeline, leveraging GenAI, style transfer, and 3D rendering, can produce high-quality datasets for training industrial vision models. The approach achieves strong performance on real-world rust detection tasks, validating the utility of synthetic data in domains where real data is scarce or costly to obtain. The pipeline's extensibility and reliance on open-source components position it as a practical solution for a broad range of industrial CV applications, with significant implications for predictive maintenance and automated inspection.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Explain it Like I'm 14

What is this paper about?

This paper introduces SynGen-Vision, a way to create “fake but realistic” images to train computer vision systems that spot rust on industrial equipment. Instead of waiting months or years to collect real photos of machines rusting in different ways, the authors use generative AI and 3D tools to make high‑quality training pictures with automatic labels. These pictures help a computer learn to detect rust in real-world photos.

What questions were they trying to answer?

The authors focused on simple, practical questions:

  • Can we generate realistic training images that show different kinds and amounts of rust?
  • Can these synthetic images train a model that works well on real photos?
  • Which steps make the synthetic data most useful: using GenAI alone, adding style transfer, or also cleaning up noisy textures?

How did they do it?

They built an end-to-end pipeline that turns text prompts into useful training data. Here’s the idea in everyday language:

  • Start with 3D models and scenes Think of a video game environment: you have 3D objects (like tanks or pipes) in a 3D world. The team uses these as the base.
  • Generate rust “textures” with GenAI A texture is like a sticker or skin that wraps around a 3D object to give it color and detail. Using a text-to-image model (Stable Diffusion), they type prompts like “complete rust” or “rust streaks” to create rust textures. They learned that adding words like “texture” or “surface” produces cleaner results.
  • Blend the new rust with the original details (style transfer) If you slap a new rust skin on the object, you might lose important markings (logos, labels). Style transfer acts like combining two photos: it keeps the original object’s details (content) but adds the look of rust (style). This makes the rusted object look more realistic and keeps fine details.
  • Remove bad textures (noise removal) Sometimes AI-generated images include watermarks, random text, or the wrong amount of rust. The team filters out those bad textures with image processing so only good ones are used.
  • Wrap textures correctly (UV mapping) and build scenes UV mapping is like unwrapping a 3D toy into a flat map so you can place the sticker precisely, then wrapping it back on. They use Blender (a 3D tool) to apply the rust textures and set up scenes with different camera angles, distances, and lighting.
  • Render images and auto-label them They render many images and automatically add “bounding boxes” (rectangles around the object) and a rust label like “complete rust” or “rust streaks.” These labeled images become the training set.
  • Train and test a detection model They train a popular object detection model (YOLOv5) on 2,000 synthetic images and then test it on about 100 real photos they labeled by hand.

Key terms in simple words:

  • Synthetic data: fake but realistic images made by computer.
  • Texture: the “skin” or surface pattern you wrap around a 3D model.
  • UV map: a 2D layout of a 3D object’s surface so textures can be placed precisely.
  • Bounding box: a rectangle drawn around what you’re trying to detect.
  • mAP50: a score (0 to 1) showing how good detection is when the predicted box overlaps the real target by at least 50%. Higher is better.

What did they find?

  • Best performance came from combining three steps: GenAI + style transfer + noise removal. This combo produced the most realistic and useful training images.
  • The trained model reached an mAP50 score of about 0.87 on real images. In simple terms, that’s strong performance for a model trained entirely on synthetic data.
  • The model worked across different object shapes (not just a single kind of tank), showing it learned general rust patterns.
  • Compared to using GenAI alone, adding style transfer kept important details and improved results; cleaning noisy textures improved them even more.

Why this matters:

  • Getting lots of real photos of equipment with different rust levels is slow, costly, and sometimes impossible. Synthetic data speeds things up and lowers cost.
  • Despite common GenAI issues (like watermarks or random text), careful filtering and style transfer can make the results clean and realistic.

Why does it matter?

This approach can make industrial maintenance smarter and cheaper:

  • Faster training: Companies don’t need to wait for real rust to appear in many conditions to train a model.
  • Safer inspections: Automated rust detection can help catch problems early, preventing breakdowns.
  • Flexible use: The same pipeline can be adapted to other wear-and-tear signs, like cracks, dents, or aging paint—just change the prompts and textures.
  • Better data with less effort: High-quality, labeled images can be generated on demand, which is a big deal when real data is scarce.

In short, SynGen-Vision shows that carefully crafted synthetic data—using GenAI, style transfer, cleanup steps, and 3D rendering—can train reliable vision models for real industrial problems.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 150 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube