MatSynth: A Modern PBR Materials Dataset (2401.06056v3)

Published 11 Jan 2024 in cs.CV and cs.GR

Abstract: We introduce MatSynth, a dataset of 4,000+ CC0 ultra-high resolution PBR materials. Materials are crucial components of virtual relightable assets, defining the interaction of light at the surface of geometries. Given their importance, significant research effort was dedicated to their representation, creation and acquisition. However, in the past 6 years, most research in material acquisiton or generation relied either on the same unique dataset, or on company-owned huge library of procedural materials. With this dataset we propose a significantly larger, more diverse, and higher resolution set of materials than previously publicly available. We carefully discuss the data collection process and demonstrate the benefits of this dataset on material acquisition and generation applications. The complete data further contains metadata with each material's origin, license, category, tags, creation method and, when available, descriptions and physical size, as well as 3M+ renderings of the augmented materials, in 1K, under various environment lightings. The MatSynth dataset is released through the project page at: https://www.gvecchio.com/matsynth.

References (53)

Citations (9)

View on Semantic Scholar

Summary

The paper presents a curated dataset of over 4,000 unique 4K, tileable PBR materials with comprehensive metadata and licensing details.
The methodology includes rigorous quality control, diverse data augmentation techniques, and seamless integration with existing datasets.
Experimental results demonstrate significant improvements in material acquisition and generation, enhancing performance in models like SurfaceNet and MatFuse.

MatSynth: A Modern PBR Materials Dataset

Introduction

The MatSynth dataset is a comprehensive collection of Physically Based Rendering (PBR) materials designed to support modern learning-based techniques for material-related tasks such as acquisition, generation, and synthetic data augmentation. This dataset aims to bridge the gap between public and private material datasets by providing a rich and diverse assortment of high-resolution materials under a permissive license. It addresses the limitations of previous datasets by offering a greater variety and volume of materials.

Figure 1: Renderings under various environment maps. We show four materials (Metal, Leather, Plastic, and Pebbles) from the dataset rendered under the 5 chosen environment maps.

Materials Collection and Data Processing

MatSynth was meticulously curated from publicly available online sources, resulting in over 6,000 materials, filtered down to 4,069 unique 4K, tileable materials. Each material is represented by a comprehensive set of maps, including Base Color, Diffuse, Normal, Height, Roughness, Metallic, and Specular. All materials are subject to stringent quality control checks, ensuring their suitability for high-fidelity rendering tasks.

Dataset Annotations and Augmentation

Each material in the dataset is accompanied by extensive metadata, including its source, tags, creation method, and licensing information. This comprehensive annotation supports a wide range of research applications, from machine learning-based material synthesis to detailed material property studies.

Data augmentation is a prominent feature of MatSynth, with numerous rotations, crops, and environment map renderings of each material, resulting in millions of sample renderings. This augmentation strategy is essential for training robust machine learning models and enables more diverse real-world applications.

Figure 2: Render samples using the two-pass strategy. This ensures that the maps and the rendering are well aligned, avoiding parallax effects while preserving specular highlights.

Compatibility and Integration with Existing Datasets

To ensure seamless integration with existing resources, the MatSynth dataset has been processed with compatibility in mind. It can be combined with prior datasets to form an extensive library for research and development. The dataset adheres to uniform standards in its material representation, facilitating its use alongside established workflows.

Experimental Results and Evaluation

The quantitative and qualitative evaluation of MatSynth highlights its impact on material acquisition and generation tasks. When used to train state-of-the-art models like SurfaceNet, the dataset improves on prior results in normal and roughness maps recovery, demonstrating significant enhancements in material quality capture.

The dataset's effectiveness in generating diverse and high-quality synthetic materials is validated by training generative models (e.g., MatFuse), which show improved FID scores—indicative of higher realism and variety in generated textures. This diversity and realism are essential for advancing synthetic data generation and enriching virtual environments.

Figure 3: Qualitative material acquisition comparison on synthetic data. We compare the dataset against previous benchmarks, showing improved fidelity and diversity.

Conclusion

MatSynth represents a substantial advancement in the availability of high-quality material datasets. By offering a broad assortment of materials with comprehensive metadata and rendering options, the dataset facilitates a variety of research endeavors in material acquisition and generation. It effectively equates some of the benefits previously reserved for internal libraries, fostering broader innovation within the research community.

In summary, MatSynth promises to propel forward research and development in fields related to material science, offering tools to develop more accurate computational models and richer digital environments. As advances continue in the area of materials science and computer graphics, MatSynth serves as a foundational resource for ongoing innovation.