Papers
Topics
Authors
Recent
Search
2000 character limit reached

MegaStyle-1.4M: Paired Style Dataset

Updated 13 April 2026
  • MegaStyle-1.4M is a large-scale style dataset offering 1.4M paired images across 170K fine-grained styles, enabling precise style transfer and similarity measurement.
  • It employs a three-stage curation process combining image pooling, prompt balancing, and hierarchical clustering to maintain both intra-style consistency and inter-style diversity.
  • The dataset underpins advanced models like MegaStyle-Encoder and MegaStyle-FLUX, achieving state-of-the-art performance in style transfer and robust style representation.

MegaStyle-1.4M is a large-scale, paired style dataset designed to achieve high intra-style consistency and inter-style diversity for style representation learning and style transfer. Constructed via a systematic and scalable data curation pipeline that exploits the consistent text-to-image style mapping capabilities of modern generative models, MegaStyle-1.4M provides 1.4 million image pairs spanning 170,000 fine-grained style categories and 8,355 coarse style labels. Its design emphasizes both the quality of style-conditioned image generation and the balanced coverage of rare and common styles, enabling robust style similarity measurement and highly generalizable style transfer (Gao et al., 9 Apr 2026).

1. Construction Methodology

MegaStyle-1.4M is generated through a three-stage procedure:

  1. Image-Pool Collection: A style pool and a content pool are established, each containing 2 million images. The style pool includes 1 million deduplicated JourneyDB (Midjourney) prints, 80,000 WikiArt paintings, and 1 million LAION-Aesthetics images filtered by WikiArt style keywords. The content pool comprises the remaining LAION-Aesthetics images not assigned to the style pool.
  2. Prompt Curation and Balancing: Captioning is performed with Qwen3-VL-30B-Instruct, producing style captions (~32 words) specifying overall artistic style, color, light distribution, medium, texture, and brushwork, and content captions (~64 words) which strictly describe object arrangements and spatial/semantic relations without reference to color or style. Exact, fuzzy, and semantic deduplication (Nemo-Curator) reduce the raw 4 million captions to approximately 1 million. Hierarchical kk-means clustering (mpnet embeddings, 4 levels with kk={50K, 10K, 5K, 1K}) yields 170,000 balanced style prompts and 400,000 content prompts.
  3. Paired Style-Image Generation: For each style prompt, N8N \approx 8 unique content prompts are sampled (without overlap), forming approximately 1.36 million content–style pairs. These pairs are rendered using Qwen-Image with a classifier-free guidance (CFG) scale of 4.0 and 40 diffusion steps, producing 1.4 million stylized images with highly consistent renderings within each style class and strong diversity across styles.

2. Prompt Galleries and Balancing Techniques

MegaStyle-1.4M’s prompt gallery includes 170,000 style prompts covering 8,355 distinct “overall artistic style” labels (e.g., “impressionism,” “cyberpunk digital art,” “ukiyo-e”) and 400,000 content prompts emphasizing objects, scene arrangements, and relationships devoid of stylistic descriptors. A hierarchical level-by-level sampling scheme ensures equitable cluster representation across style and content types, preventing the dominance of common styles in the gallery while preserving the natural long-tail distribution intrinsic to real-world style frequency.

3. Dataset Structure and Comparative Metrics

MegaStyle-1.4M offers paired, intra-style-consistent groupings:

  • Images: 1.4 million
  • Fine-grained style classes: 170,000 (one per style prompt)
  • Images per style class: On average, 8 distinct content instances per style
  • Coverage: 8,355 coarse style labels
  • No prescribed train/val/test split: Users may split by style prompt or randomly on content–style pairs.

Key comparison metrics with prior datasets are summarized:

Dataset Intra-style paired #Coarse Styles #Fine-grained Styles #Images
WikiArt No 27 80 K
JourneyDB No 300 K 4.4 M
Style30K No 1,120 30 K
IMAGStyle Yes 14 15 K 210 K
OmniStyle-150K Yes 1,000 150 K
MegaStyle-1.4M Yes 8,355 170 K 1.4 M

MegaStyle-1.4M distinguishes itself as the first dataset of this scale to provide intra-style paired groupings, extensive fine-grained style coverage, and systematic balancing across style frequencies.

4. Quantitative Quality Criteria

The dataset's efficacy is evaluated along two axes:

  • Intra-style consistency (Cintra(s)C_\text{intra}(s)):

Cintra(s)=1Ps(i,j)Psd(zi,zj)C_\text{intra}(s) = \frac{1}{|P_s|} \sum_{(i,j)\in P_s} d(z_i, z_j)

where PsP_s is the set of image pairs within style ss and d(,)d(\cdot,\cdot) is a style distance in feature space (e.g., Gram, CLIP-derived, MegaStyle-Encoder output). Lower CintraC_\text{intra} indicates higher within-style similarity.

  • Inter-style diversity (DinterD_\text{inter}):

kk0

where kk1 averages pairwise style features between styles kk2 and kk3. Higher kk4 denotes better style separation.

Empirical results show that MegaStyle-1.4M achieves significantly lower kk5 and higher kk6 than prior style transfer datasets, as validated through image retrieval and human preference studies.

5. Applications in Model Training

MegaStyle-1.4M directly supports two key model training pipelines:

  • MegaStyle-Encoder (style-supervised contrastive learning):
    • Backbone: SigLIP image encoder (frozen, except projection head)
    • Loss: Sum of supervised contrastive loss among images of the same style and image–text contrastive loss with the style prompt.
    • SCL (image–image, within style class):

    kk7 - ITC (image–text, image and its style prompt):

    kk8

    kk9 if N8N \approx 80 matches N8N \approx 81, else N8N \approx 82. - Total loss: N8N \approx 83 - Training: Batch size 8192, N8N \approx 84, 30 epochs - Retrieval performance: mAP@1 ≈ 87%, Recall@1 ≈ 88% on StyleRetrieval

  • MegaStyle-FLUX (paired style transfer):

    • Base: FLUX (DiT transformer variant)
    • Supervised with MegaStyle-1.4M pairs; for each style, sample two images (“reference” and “target”)
    • Inputs: Noisy target content tokens, reference style VAE tokens, content caption for text conditioning, shifted RoPE on style tokens to avoid positional leakage
    • Only DiT backbone fine-tuned (LoRA rank=128, 30K steps, lr=1e-4)
    • Delivers state-of-the-art generalizable style transfer with robust style-to-text and cross-image alignment

6. Availability and Usage Guidelines

  • Download and code:

https://jeoyal.github.io/MegaStyle/

  • Provisional license status:

The paper does not specify a definitive license but recommends adhering to CC BY 4.0 for redistributed images/prompts. Original licenses of JourneyDB, WikiArt, and LAION must be respected.

  • Intended use:

Scientific research and non-commercial applications are permitted, provided citations to “MegaStyle: …” (Gao et al., CVPR 2025) are included. Re-commercialization or claims of ownership over the source images from LAION, JourneyDB, or WikiArt are disallowed.

MegaStyle-1.4M’s design establishes a new standard for scalable, style-consistent, and diverse visual style datasets, supporting both accurate style similarity measurements and the training of generalizable, robust style transfer models (Gao et al., 9 Apr 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MegaStyle-1.4M Dataset.