MegaStyle-1.4M: Paired Style Dataset

Updated 13 April 2026

MegaStyle-1.4M is a large-scale style dataset offering 1.4M paired images across 170K fine-grained styles, enabling precise style transfer and similarity measurement.
It employs a three-stage curation process combining image pooling, prompt balancing, and hierarchical clustering to maintain both intra-style consistency and inter-style diversity.
The dataset underpins advanced models like MegaStyle-Encoder and MegaStyle-FLUX, achieving state-of-the-art performance in style transfer and robust style representation.

MegaStyle-1.4M is a large-scale, paired style dataset designed to achieve high intra-style consistency and inter-style diversity for style representation learning and style transfer. Constructed via a systematic and scalable data curation pipeline that exploits the consistent text-to-image style mapping capabilities of modern generative models, MegaStyle-1.4M provides 1.4 million image pairs spanning 170,000 fine-grained style categories and 8,355 coarse style labels. Its design emphasizes both the quality of style-conditioned image generation and the balanced coverage of rare and common styles, enabling robust style similarity measurement and highly generalizable style transfer (Gao et al., 9 Apr 2026).

1. Construction Methodology

MegaStyle-1.4M is generated through a three-stage procedure:

Image-Pool Collection: A style pool and a content pool are established, each containing 2 million images. The style pool includes 1 million deduplicated JourneyDB (Midjourney) prints, 80,000 WikiArt paintings, and 1 million LAION-Aesthetics images filtered by WikiArt style keywords. The content pool comprises the remaining LAION-Aesthetics images not assigned to the style pool.
Prompt Curation and Balancing: Captioning is performed with Qwen3-VL-30B-Instruct, producing style captions (~32 words) specifying overall artistic style, color, light distribution, medium, texture, and brushwork, and content captions (~64 words) which strictly describe object arrangements and spatial/semantic relations without reference to color or style. Exact, fuzzy, and semantic deduplication (Nemo-Curator) reduce the raw 4 million captions to approximately 1 million. Hierarchical $k$ -means clustering (mpnet embeddings, 4 levels with $k$ ={50K, 10K, 5K, 1K}) yields 170,000 balanced style prompts and 400,000 content prompts.
Paired Style-Image Generation: For each style prompt, $N \approx 8$ unique content prompts are sampled (without overlap), forming approximately 1.36 million content–style pairs. These pairs are rendered using Qwen-Image with a classifier-free guidance (CFG) scale of 4.0 and 40 diffusion steps, producing 1.4 million stylized images with highly consistent renderings within each style class and strong diversity across styles.

2. Prompt Galleries and Balancing Techniques

MegaStyle-1.4M’s prompt gallery includes 170,000 style prompts covering 8,355 distinct “overall artistic style” labels (e.g., “impressionism,” “cyberpunk digital art,” “ukiyo-e”) and 400,000 content prompts emphasizing objects, scene arrangements, and relationships devoid of stylistic descriptors. A hierarchical level-by-level sampling scheme ensures equitable cluster representation across style and content types, preventing the dominance of common styles in the gallery while preserving the natural long-tail distribution intrinsic to real-world style frequency.

3. Dataset Structure and Comparative Metrics

MegaStyle-1.4M offers paired, intra-style-consistent groupings:

Images: 1.4 million
Fine-grained style classes: 170,000 (one per style prompt)
Images per style class: On average, 8 distinct content instances per style
Coverage: 8,355 coarse style labels
No prescribed train/val/test split: Users may split by style prompt or randomly on content–style pairs.

Key comparison metrics with prior datasets are summarized:

Dataset	Intra-style paired	#Coarse Styles	#Fine-grained Styles	#Images
WikiArt	No	27	—	80 K
JourneyDB	No	—	300 K	4.4 M
Style30K	No	—	1,120	30 K
IMAGStyle	Yes	14	15 K	210 K
OmniStyle-150K	Yes	—	1,000	150 K
MegaStyle-1.4M	Yes	8,355	170 K	1.4 M

MegaStyle-1.4M distinguishes itself as the first dataset of this scale to provide intra-style paired groupings, extensive fine-grained style coverage, and systematic balancing across style frequencies.

4. Quantitative Quality Criteria

The dataset's efficacy is evaluated along two axes:

Intra-style consistency ( $C_\text{intra}(s)$ ):

$C_\text{intra}(s) = \frac{1}{|P_s|} \sum_{(i,j)\in P_s} d(z_i, z_j)$

where $P_s$ is the set of image pairs within style $s$ and $d(\cdot,\cdot)$ is a style distance in feature space (e.g., Gram, CLIP-derived, MegaStyle-Encoder output). Lower $C_\text{intra}$ indicates higher within-style similarity.

Inter-style diversity ( $D_\text{inter}$ ):

$k$ 0

where $k$ 1 averages pairwise style features between styles $k$ 2 and $k$ 3. Higher $k$ 4 denotes better style separation.

Empirical results show that MegaStyle-1.4M achieves significantly lower $k$ 5 and higher $k$ 6 than prior style transfer datasets, as validated through image retrieval and human preference studies.

5. Applications in Model Training

MegaStyle-1.4M directly supports two key model training pipelines:

MegaStyle-Encoder (style-supervised contrastive learning):
- Backbone: SigLIP image encoder (frozen, except projection head)
- Loss: Sum of supervised contrastive loss among images of the same style and image–text contrastive loss with the style prompt.
- SCL (image–image, within style class):
$k$ 7 - ITC (image–text, image and its style prompt):

$k$ 8

$k$ 9 if $N \approx 8$ 0 matches $N \approx 8$ 1, else $N \approx 8$ 2. - Total loss: $N \approx 8$ 3 - Training: Batch size 8192, $N \approx 8$ 4, 30 epochs - Retrieval performance: mAP@1 ≈ 87%, Recall@1 ≈ 88% on StyleRetrieval
MegaStyle-FLUX (paired style transfer):
- Base: FLUX (DiT transformer variant)
- Supervised with MegaStyle-1.4M pairs; for each style, sample two images (“reference” and “target”)
- Inputs: Noisy target content tokens, reference style VAE tokens, content caption for text conditioning, shifted RoPE on style tokens to avoid positional leakage
- Only DiT backbone fine-tuned (LoRA rank=128, 30K steps, lr=1e-4)
- Delivers state-of-the-art generalizable style transfer with robust style-to-text and cross-image alignment

6. Availability and Usage Guidelines

Download and code:

https://jeoyal.github.io/MegaStyle/

Provisional license status:

The paper does not specify a definitive license but recommends adhering to CC BY 4.0 for redistributed images/prompts. Original licenses of JourneyDB, WikiArt, and LAION must be respected.

Intended use:

Scientific research and non-commercial applications are permitted, provided citations to “MegaStyle: …” (Gao et al., CVPR 2025) are included. Re-commercialization or claims of ownership over the source images from LAION, JourneyDB, or WikiArt are disallowed.

MegaStyle-1.4M’s design establishes a new standard for scalable, style-consistent, and diverse visual style datasets, supporting both accurate style similarity measurements and the training of generalizable, robust style transfer models (Gao et al., 9 Apr 2026).

Markdown Report Issue Upgrade to Chat

References (1)

MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MegaStyle-1.4M Dataset.