DuMaS: Two-Level Material Selection Dataset
- DuMaS is a synthetic dataset offering dual-level per-pixel annotations that support hierarchical material selection tasks in image analysis.
- It employs detailed texture and subtexture annotations to facilitate fine-grained segmentation and material-aware editing in research and design.
- The dataset’s large-scale synthesis from 132 synthetic scenes enables robust benchmarking for vision transformer-based material selection methods.
The Two-Level Material Selection (DuMaS) Dataset is a large-scale synthetic dataset designed for dense, hierarchical material selection tasks in images, with rich per-pixel annotations at both texture and subtexture levels. Its primary utility lies in supporting the development and evaluation of computational methods for fine-grained, spatially varying selection of materials—an essential task for material-aware image editing, vision-based material recognition, and design optimization workflows.
1. Definition and Dataset Structure
The DuMaS (Dual-level Material Selection) Dataset was specifically created to support two granularity levels of material selection in images:
- Texture Level: Annotation groups all pixels sharing the same texture mapping or material assignment on a scene object (e.g., all squares in a checkerboard, irrespective of color).
- Subtexture Level: Annotation distinguishes between components within a texture—for instance, separating black from white squares in a checkerboard, depending on underlying reflectance sources.
DuMaS contains approximately 816,000 high-resolution (1024×1024) images, rendered from 132 synthetic 3D scenes (119 indoor, 13 outdoor). Each scene is rendered five times with varying material assignments, combining over 10,000 unique textures generated via proprietary procedures involving blending reflectance maps and binary masks.
Per-pixel ground truth labels for each image are stored in two forms:
- Texture annotation map: Assigns unique IDs to each distinct texture map present in the scene.
- Subtexture annotation map: Assigns unique IDs to constituent reflectance components within the composite texture.
2. Annotation Process and Material Generation
The generation of materials in DuMaS follows a systematic procedure:
- Base Material Library: 3,026 stationary reflectance maps are sourced from the Adobe 3D Assets library.
- Binary Mask Generation: 1,571 binary masks are created using random blends of 12 base noise patterns, enabling spatially distinct regions within a texture.
- Texture Synthesis: Each binary mask is applied as an alpha channel to randomly select and combine pairs of reflectance maps, yielding 7,855 composite textures, which—along with base reflectances—result in 10,881 material maps.
- Scene Assignment: For each of the 132 Evermotion scenes, five sets of material assignments are created, systematically varying which material map is assigned to which object.
- Rendering and Annotation: Each unique instantiation is rendered into video sequences, with per-frame annotation maps for both the texture and subtexture material IDs.
A key consequence is that material labels are not only surface-level (as in most prior datasets) but are linked at the level of local reflectance sources, enabling unprecedented granularity in annotation.
3. Methodological Relevance: Enabling Two-Level Selection
DuMaS provides unique support for computational two-level material selection and segmentation:
- Texture-Level Tasks: Algorithms may select or mask all areas sharing the same macro-texture, useful for grouping similar surfaces or components in edited images.
- Subtexture-Level Tasks: Finer-grained selection is possible for subcomponents within repetitive or composite designs, such as the dark stripes in a zebra pattern or particular colored motifs in a wallpaper. This resolves limitations in previous datasets, such as Materialistic, which could not distinguish within repeated or binary textures.
Supervision at both levels enables models to be trained and tested on their ability to group pixels by overall material as well as by constituent reflectance, providing a testbed for evaluation of multi-scale, multi-level selection techniques.
4. Material Selection Methodology and Benchmarking
DuMaS is paired with a vision transformer-based methodology for material selection in images, using multi-resolution feature extraction and cross-similarity guided masking:
- Vision Transformer Backbone: DINOv2 ViT-B/14 is used for feature extraction at multiple transformer blocks, chosen for robust contextualization and boundary sharpness.
- Multi-Resolution Aggregation: To preserve both global context and fine boundaries, images are processed at multiple scales—a low-resolution, global view, and high-resolution tiles—to capture both coarse and detail information.
- Query-Conditioned Masking: Material selection is typically cast as a single-click segmentation problem; matching the pixel features of the query to the rest of the image via feature similarity.
- Joint Loss and Training: Joint binary cross-entropy loss is used across both texture and subtexture annotations, enabling the model to learn both grouping levels simultaneously.
The dense, two-level annotation provides ground truth for precise quantitative metrics at both selection granularities.
5. Comparative Position and Unique Features
Compared to other datasets in the vision and material recognition field, DuMaS distinguishes itself by:
Dataset | # Annotated Images | Annotation Level | Subtexture Support | Synthetic Scenes |
---|---|---|---|---|
Materialistic | ~50,000 | Texture (object-level) | No | Yes |
MINC | Large | Patchwise (coarse category) | No | Real |
DuMaS | ~816,000 | Texture + Subtexture (dense) | Yes | Yes |
- Scale: DuMaS contains more than 800,000 densely annotated images, providing extensive diversity and statistical rigor.
- Dual-Level (Hierarchical) Annotation: Supports both “material as assigned” and “material as reflected/subcomponent,” an axis of differentiation absent in prior datasets.
- Synthetic Generation: Enables precise control over annotation, material assignments, and scene diversity, which is challenging in real-world datasets.
- Per-Pixel Dense Labels: Essential for fine-grained segmentation and selection algorithm development and benchmarking.
6. Applications and Practical Uses
DuMaS is intended for research and development in:
- Fine-Grained Material Selection: Training and evaluation of algorithms that select spatially varying materials in natural and synthetic images at multiple grouping levels.
- Image Editing Tools: Enabling pixel-accurate selection and manipulation (color change, texture replacement) both at the object and subcomponent level in photo-realistic images.
- Vision-Based Recognition: Underpinning models for robust hierarchical material recognition in settings with complex mappings from surface appearance to underlying material structure.
- Benchmarking New Models: Providing standardized quantitative and qualitative benchmarks for model performance on two-level selection.
The inclusion of both real and synthetic test sets, as well as support for user-query-based segmentation (such as single- or multi-click selection), enables realistic, reproducible comparison of algorithms.
7. Limitations and Potential Extensions
While DuMaS sets a new standard for scale and dual-level annotation, there are several boundaries to its scope:
- Synthetic Bias: All images are rendered from synthetic scenes. Generalization to real-world photographs without domain adaptation may be limited.
- Label Space: Although >10,000 unique materials are present, these are generated via reflectance map blending and may not reflect the full diversity or physicality of real-world materials.
- Extension Possibilities: Methods demonstrated with DuMaS may be further enhanced by supplementing with real-world validation, adding depth or normal annotations, or exploring inter-level label consistency in downstream tasks.
A plausible implication is that DuMaS, through synthetic scale and dual-level structure, will facilitate both fundamental progress in material-based image understanding and practical innovation in vision-guided design, editing, and recognition applications.