MatHybrid-410K: Hybrid Material Dataset
- MatHybrid-410K is a large-scale hybrid dataset that integrates photorealistic RGB images with physically-based rendering maps to support multi-task material synthesis research.
- It comprises approximately 360K paired RGB-PBR sequences and 50K unpaired RGB images with auto-generated text captions, enabling diverse applications such as text-to-material and intrinsic decomposition tasks.
- The dataset employs systematic augmentation, varied lighting conditions, and geometric distortions to enhance training robustness and improve high-resolution material rendering performance.
MatHybrid-410K is a large-scale hybrid dataset designed for high-fidelity material modeling in generative systems, with a particular focus on unifying photorealistic appearance and physically-based rendering (PBR) properties. It serves as the principal training corpus for the MatPedia foundation model and addresses core limitations in earlier material synthesis datasets by integrating physically-supervised PBR sequences with broad-spectrum RGB appearance data. The dataset contains approximately 410,000 samples, encompassing both paired RGB-PBR map sequences and unpaired RGB images, with detailed annotations and diverse augmentations to support multi-task learning for text-to-material, image-to-material, and intrinsic material decomposition applications (Luo et al., 21 Nov 2025).
1. Dataset Composition and Structure
MatHybrid-410K consists of two complementary subsets:
- RGB-only Appearance Subset: Approximately 50,000 planar material photographs lacking associated PBR maps. These are derived from:
- Synthetic renderings of flat surfaces generated using the Gemini 2.5 Flash Image model (Google DeepMind).
- Real-world planar material images collected from public repositories.
Each image in this subset is automatically annotated with a caption using the Qwen 2.5-VL-72B-Instruct model for text-conditioned generative training.
- Complete PBR Material Subset: Contains approximately 360,000 paired RGB-to-PBR sequences, constructed from around 6,000 unique measured or procedural materials sourced from Matsynth (Vecchio et al. 2024) and additional open SVBRDF repositories such as OpenSVBRDF.
- Planar renderings: Each BRDF material is rendered under 32 distinct HDR environment maps, generating 192,000 RGB-to-PBR pairs for intrinsic decomposition tasks.
- Distorted renderings: The same materials are rendered on five geometric primitives (cube, sphere, cylinder, cone, torus) with varying lighting and camera viewpoints, yielding approximately 168,000 additional pairs that support geometry-aware image-to-material training.
In summary, the corpus includes approximately 360,000 paired RGB + {basecolor, normal, roughness, metallic} map sequences and 50,000 unpaired RGB samples. The hybrid composition ratio is about 88% paired data and 12% unpaired data.
2. Data Sources and Category Coverage
- PBR Materials: Sourced from Matsynth (Vecchio et al. 2024) and open SVBRDF collections, providing comprehensive coverage across material classes such as wood, metal, stone, fabric, plastics, ceramics, and composites. No per-class distribution or sampling weights are reported.
- RGB Images: Include both procedurally generated and in-the-wild photographic content. This RGB-only subset exhibits a broad spectrum of textures—tiles, laminates, textiles, natural stone, and other material categories—as illustrated in the primary publication.
- Text Descriptions: All RGB-only samples are assigned auto-generated natural language captions to facilitate multimodal (text-to-material) generative modeling.
3. Preprocessing and Dataset Statistics
- Resolution and Normalization: All samples, including both RGB and PBR maps, are preprocessed to a canonical 1024×1024 pixel resolution for use in VAE and video diffusion transformer (DiT) architectures. Data is converted to floating-point tensors in 0,1, with normal maps remapped for (x, y, z) channels, and PNG file formats are used throughout.
- Cropping and Coverage: For distorted renderings, frames are cropped or composed to ensure material regions cover at least 70% of the image area.
- Diversity Statistics: The dataset comprises around 6,000 unique PBR materials, 192,000 planar lighting variations (32 HDR maps), and 168,000 geometry/viewpoint variations (across five primitives). No histograms or explicit per-category analysis are published.
| Subset | Approx. Samples | Content/Usage |
|---|---|---|
| Paired RGB+PBR sequences | 360,000 | VAE/DiT training (intrinsic decom., image-to-material) |
| Unpaired RGB images | 50,000 | Text-to-material training |
4. Training Protocols and Integration
MatHybrid-410K is directly integrated into the MatPedia training pipeline as follows:
- VAE Fine-tuning: The 3D VAE decoder is fine-tuned on the 360,000 paired sequences for 10,000 steps using AdamW (learning rate , , ).
- Video Diffusion Transformer (DiT) Training: DiT is trained for 200,000 steps with LoRA (rank 128), employing a batch size of 16 and a learning rate of . Mixed batches are sampled from the entire dataset:
- Paired samples receive full supervision on both RGB and PBR latents, optimized with the rectified-flow loss .
- Unpaired RGB samples supervise only the RGB latent (i.e., for text-to-material synthesis).
- Batch Composition: Each minibatch comprises both paired and unpaired samples, promoting multi-task capabilities.
- Augmentation Strategies: Systematic variation is achieved via 32 HDR environment maps for lighting and geometric distortions across five primitives. Random cropping and resizing ensure spatial robustness.
5. Dataset Splits, Evaluation, and License
- Splitting Protocol: No explicit train/val/test split is defined. In practice, the full paired set is used for training, while evaluations employ standard benchmark datasets (e.g., the MaterialPicker test set).
- Evaluation: There is no mention of cross-validation or other formal validation protocols. All training is conducted on the entire available corpus, with downstream evaluations focusing on standard material decomposition and generative synthesis benchmarks.
- Licensing: MatHybrid-410K is to be released publicly in parallel with MatPedia. All underlying data sources (Matsynth, OpenSVBRDF, Gemini Flash, and public photographs) carry permissive academic or Creative Commons licenses, with no additional usage restrictions specified.
6. Significance and Applications
MatHybrid-410K enables multi-modal, high-resolution, and physically-grounded material synthesis:
- Supports unified training across text-to-material, image-to-material, and intrinsic decomposition tasks by providing both RGB-only and RGB+PBR paired samples within each batch.
- Empowers synthesis and decomposition models to natively leverage both photorealistic appearance and ground-truth physical parameters without fragmenting pipelines or limiting generalization to task-specific domains.
- Native synthesis is achieved, and all model outputs are upsampled to via RealESRGAN at inference.
- Its design directly addresses previous shortages in hybrid material datasets, specifically the absence of robust joint coverage between natural image appearance and physically accurate PBR properties (Luo et al., 21 Nov 2025).
7. Limitations and Prospects for Extension
- Class Distribution Reporting: No per-material class counts or histograms are provided, limiting precise characterization of representation across material types.
- Validation Protocols: Absence of explicit train/val/test splits and lack of cross-validation protocols may limit certain reproducibility practices.
- Sampling Weights: All paired materials are uniformly sampled without reported per-category weights. A plausible implication is potential imbalance in underrepresented material categories, though this is not quantified.
- Diversity Documentation: While the range of textures and geometries is broad, the lack of explicit diversity metrics (e.g., per-category statistics) makes it harder to formally analyze dataset coverage.
- Public Availability: MatHybrid-410K is slated for public release, which is expected to facilitate further benchmarking, model development, and comparative studies in data-driven material synthesis. Future work may include formal reporting of per-category diversity and systematic split protocols to further support standardization in the field.