CIFAKE Dataset Benchmark
- CIFAKE is a benchmark dataset comprising 120K balanced real and synthetic images, enabling reliable evaluation of binary classifiers and forensic pipelines.
- It features multiple variants and perturbations—using GANs, diffusion models, and adversarial corruptions—to test robustness in synthetic image detection.
- Experiments on CIFAKE report strong performance with CNNs, transfer learning, and vision transformers, highlighting its value in digital media forensics.
The CIFAKE dataset is a rigorously labeled corpus of real and AI-generated (synthetic) images designed as a benchmark for the development and evaluation of binary classifiers, data quality assessment tools, and explainability algorithms in the context of synthetic image detection. Developed initially by Bird & Lotfi and subsequently adopted across multiple studies, CIFAKE has become the standard testbed for research into detection of generative model outputs, artifact localization, data-centric AI pipelines, and robust representation learning for digital media forensics.
1. Dataset Composition and Construction
CIFAKE is characterized by strict class balance and controlled acquisition, supporting a variety of modeling paradigms:
- Total size: 120,000 images (60,000 authentic, 60,000 synthetic) (Wang et al., 2024, Bird et al., 2023, Nirob et al., 27 Jan 2026).
- Image sources:
- Real: Curated from open-source photographic repositories and public vision datasets (e.g., CIFAR-10, subsets of ImageNet).
- Fake: Generated using state-of-the-art image synthesis pipelines. Initial versions used only GANs (ProGAN, StyleGAN/2); later versions adopted diffusion models such as Stable Diffusion v1.4, v2.1, v3.0, and GAN variants (Wang et al., 2024, Bird et al., 2023, Jiang, 2024, Nirob et al., 27 Jan 2026). Some studies note a larger variant totaling 1.2M images, dominated by diffusion-generated fakes (Mathur et al., 27 Oct 2025).
- Resolution: The canonical release downscales all images to 32×32 pixels RGB (CIFAR-scale), facilitating rapid prototyping and computationally efficient modeling (Bird et al., 2023, Chen et al., 29 Sep 2025). Some versions store a higher-resolution master set (e.g., 256×256 in JPEG), but most downstream models operate at 24×24, 32×32, or, after upsampling, 224×224 (Wang et al., 2024, Bird et al., 2023, Das et al., 25 Aug 2025).
- Preprocessing: Uniform pipelines are reported: conversion to RGB, resizing (center or random crop), optional grayscale for SVM baselines, per-channel normalization to ImageNet statistics for transfer-learning architectures (Wang et al., 2024).
- Labeling: CIFAKE is self-labeled, with “Fake” assigned at the time of synthetic image generation and “Real” at download. Random audit of 5% of images achieved >99% label accuracy; no crowdsourced or automated error correction is used (Wang et al., 2024).
| Subset | Real | Fake | Resolution |
|---|---|---|---|
| Canonical CIFAKE | 60,000 | 60,000 | 32×32 |
| Large variant (CiFAKE) | ≈520k | ≈680k | 32×32 |
| High-res variant | 60,000 | 60,000 | 256×256† |
† High-res source images; most experiments use 32×32 (Wang et al., 2024, Mathur et al., 27 Oct 2025).
2. Dataset Variants, Extensions, and Perturbations
CIFAKE serves as a platform for a wide range of robustness and generalization studies via dataset augmentation and generator diversity:
- Diffusion-based extensions: CIFAKE-SD2.1 and CIFAKE-SD3.0 use more advanced generative models to synthesize fake images (Stable Diffusion v2.1/v3.0), maintaining a 50:50 real/fake ratio (Jiang, 2024).
- Prompt and artifact perturbations: Studies introduce variations via prompt engineering (including GPT-4o-based and negative prompts), Gaussian blurring (kernel 11×11, σ=1.1), and LoRA fine-tuning for photorealism (Jiang, 2024). Complete variant sets are built for cross-domain and stress-testing: blurred, negative prompt, LoRA, and highly descriptive GPT-4o variants; all retain a 60k/60k class balance.
- Adversarial and synthetic corruption: In robustness-oriented work, adversarial perturbations (Gaussian noise, salt-and-pepper, motion blur, pixelation, quantization, adversarial noise from MobileNetV2 white-box attacks) significantly expand the dataset for out-of-distribution and edge-device experiments (Mathur et al., 27 Oct 2025).
- Quality-degraded and near-duplicate samples: Quality analysis pipelines systematically induce blurring (kernel sizes up to 11), severe downscaling (down to 4×4 pixels, then upsampled), low-information content, odd aspect ratio, grayscale conversion, and exact/near-duplicate creation for deduplication benchmarking (Chen et al., 29 Sep 2025).
3. Experimental Protocols and Evaluation Metrics
CIFAKE is foundational in the benchmarking of binary classifiers and data-centric quality assessment workflows. Typical experimental pipelines and metrics include:
- Train/Test splits: Canonical experiments use a 100,000/20,000 split (50,000 real + 50,000 fake for training; 10,000 real + 10,000 fake for testing) (Wang et al., 2024, Bird et al., 2023).
- Validation strategy: Limited explicit validation splits; most studies use either cross-validation or a 90/10 train/validation split within the train set for tuning (Nirob et al., 27 Jan 2026, Chen et al., 29 Sep 2025).
- Performance metrics: All major works employ accuracy, precision, recall, F1-score, as well as PR-AUC, ROC-AUC, and Brier calibration scores.
- Robustness to corruptions is typically benchmarked by decrease in accuracy/F1 under perturbations.
- Data-centric quality detection: Assessment via automatic thresholding (histogram-based: Otsu, Li, etc.) and deduplication (pHash, cosine similarity) achieves F1 up to 0.9468 for single, 0.8557 for dual perturbations, and 0.7928 for near-duplicate detection (Chen et al., 29 Sep 2025).
4. Modeling Paradigms and Notable Baselines
CIFAKE enables direct comparison of diverse family of detection methods, from handcrafted descriptors to state-of-the-art deep networks:
- Handcrafted features: Seven families—raw-pixel, color histogram, DCT, HOG, LBP, GLCM, wavelets—fused in “baseline,” “advanced,” and “mixed” configurations. Ensemble learners (LightGBM, XGBoost, CatBoost) with mixed features reach PR-AUC=0.9879, ROC-AUC=0.9878, F1=0.9447, Brier=0.0414 (Nirob et al., 27 Jan 2026).
- CNN and transfer learning: Custom two-layer CNN: 93.2% accuracy, F1=0.936, ROC AUC≈0.98 (Bird et al., 2023). Transfer-learned DenseNet: 97.74% accuracy and ROC-AUC 0.9975 (Wang et al., 2024). Vision Transformers augmented with edge-based variance modules: 97.75% accuracy, 97.77% F1-score (Das et al., 25 Aug 2025). Swin Transformer models on CIFAKE (RGB): accuracy=0.98, AUC=0.98 (Mehta et al., 22 May 2025).
- Data-centric and explainable AI: Integration with CleanVision and Fastdup for outlier and duplicate detection. Grad-CAM and artifact localization (Faster-Than-Lies+VLM) provide interpretable artifact heatmaps and semantic explanations across 70 artifact types grouped into eight semantic families (Mathur et al., 27 Oct 2025, Bird et al., 2023).
5. Statistical and Forensic Properties
CIFAKE is constructed to isolate and expose specific statistical discrepancies between real and synthetic images, supporting both discriminative and diagnostic research:
- Class balance: Strict 1:1 real/fake ratio eliminates class-imbalance bias (Wang et al., 2024).
- Visual properties: Synthetic images are engineered to match the class semantics, global color distribution, and resolution profile of real images, with subtle artifacts introduced by the generative model (e.g., background irregularities, blurring, edge smoothness, aberrant reflections) (Bird et al., 2023, Das et al., 25 Aug 2025).
- Latent analysis: Saliency mapping and t-SNE of Swin/Tiny features show tight clustering and maximal separability in RGB space, minor overlap in YCbCr and HSV (Mehta et al., 22 May 2025).
- Impact of quality factors: CNNs are significantly degraded by blurring and downscaling (accuracy drops to 25.81% for heavy blur, 32.10% for aggressive downsampling) but more robust to brightness or grayscale conversion (Chen et al., 29 Sep 2025). Ensemble models and DenseNet architectures exhibit greater resilience.
6. Access, Licensing, and Use Cases
- Availability: The canonical CIFAKE dataset (120,000 images, full splits, and labels) is hosted at https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images and mirrored via official publications (Wang et al., 2024, Bird et al., 2023).
- License: CC BY-NC 4.0—permissive for noncommercial research and derivative works with required attribution (Wang et al., 2024).
- Applications: Automated content authenticity detection in media forensics; benchmarking of GAN/diffusion detectors; robustness evaluation under distribution shift; artifact explainability by paired visual and linguistic models; data-centric workflow development; forensic pipeline construction for both edge and enterprise systems (Mathur et al., 27 Oct 2025, Wang et al., 2024, Nirob et al., 27 Jan 2026).
7. Limitations and Notes on Variant Usage
- Resolution constraints: Most CIFAKE experiments are constrained to 32×32 pixels, which may obscure higher-resolution artifacts but is optimal for computational efficiency and edge deployment. Some studies report 256×256 or 224×224 upsampling for transformer-based models (Das et al., 25 Aug 2025, Wang et al., 2024).
- Generator diversity: While later versions incorporate multiple generations (GAN, diffusion, LoRA), the standard set may lag the latest generative techniques (e.g., transformer-based or large autoregressive text-image models).
- Semantic granularity: Labels remain at the binary “Real” vs. “Fake” class; there is no explicit annotation for object category or artifact type in standard distributions.
- Potential for extension: Proposals include high-resolution benchmarks, multi-class source attribution, domain-specific variants (faces, medical, satellite), and real-world augmented artifacts for domain adaptation studies (Bird et al., 2023, Mathur et al., 27 Oct 2025).
References
- "Harnessing Machine Learning for Discerning AI-Generated Synthetic Images" (Wang et al., 2024)
- "CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images" (Bird et al., 2023)
- "Handcrafted Feature Fusion for Reliable Detection of AI-Generated Images" (Nirob et al., 27 Jan 2026)
- "Explainable Detection of AI-Generated Images with Artifact Localization..." (Mathur et al., 27 Oct 2025)
- "Edge-Enhanced Vision Transformer Framework for Accurate AI-Generated Image Detection" (Das et al., 25 Aug 2025)
- "A Data-Centric Perspective on the Influence of Image Data Quality in Machine Learning Models" (Chen et al., 29 Sep 2025)
- "Addressing Vulnerabilities in AI-Image Detection: Challenges and Proposed Solutions" (Jiang, 2024)
- "Swin Transformer for Robust CGI Images Detection: Intra- and Inter-Dataset Analysis..." (Mehta et al., 22 May 2025)
- "Swin Transformer for Robust Differentiation of Real and Synthetic Images: Intra- and Inter-Dataset Analysis" (Mehta et al., 2024)