Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection (2406.06134v1)

Published 10 Jun 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Donggeun Ko (5 papers)
  2. Sangwoo Jo (2 papers)
  3. Dongjun Lee (29 papers)
  4. Namjun Park (2 papers)
  5. Jaekwang Kim (16 papers)

Summary

We haven't generated a summary for this paper yet.