MARBLE: Material Recomposition and Blending in CLIP-Space (2506.05313v1)

Published 5 Jun 2025 in cs.CV

Abstract: Editing materials of objects in images based on exemplar images is an active area of research in computer vision and graphics. We propose MARBLE, a method for performing material blending and recomposing fine-grained material properties by finding material embeddings in CLIP-space and using that to control pre-trained text-to-image models. We improve exemplar-based material editing by finding a block in the denoising UNet responsible for material attribution. Given two material exemplar-images, we find directions in the CLIP-space for blending the materials. Further, we can achieve parametric control over fine-grained material attributes such as roughness, metallic, transparency, and glow using a shallow network to predict the direction for the desired material attribute change. We perform qualitative and quantitative analysis to demonstrate the efficacy of our proposed method. We also present the ability of our method to perform multiple edits in a single forward pass and applicability to painting. Project Page: https://marblecontrol.github.io/

Summary

The paper introduces MARBLE as a versatile framework for exemplar-based material editing using CLIP embeddings and diffusion models.
It achieves flexible material transfer and blending with parametric control over fine-grained properties such as metallic, roughness, and transparency.
Empirical evaluations indicate superior PSNR and LPIPS performance, demonstrating robust disentanglement of material attributes compared to existing methods.

Evaluating MARBLE: Material Recomposition and Blending in CLIP-Space

The paper "MARBLE: Material Recomposition and Blending in CLIP-Space" presents a novel approach to material editing leveraging CLIP-space and pre-trained generative models. This research is focused on enhancing exemplar-based material editing techniques by utilizing CLIP-space representations to achieve flexible control over material blending and parametric tuning of material attributes.

Methodology Overview

The authors introduce MARBLE as a versatile tool for material editing. This method employs CLIP embeddings and a pre-trained diffusion model, allowing for material transfer, material blending between exemplars, and parametric control over fine-grained properties such as metallic, roughness, transparency, and glow. MARBLE builds upon prior work like ZeST, which demonstrated zero-shot exemplar-based material transfer using diffusion models. The key innovation in MARBLE lies in modifying the architecture of ZeST by injecting material embeddings into specific UNet blocks associated with material attribution, thus enhancing the fidelity of material transfer.

Significantly, MARBLE integrates parametric control over material attributes through learning directions in CLIP-space, facilitated by a shallow network trained using a synthetic dataset. This setup avoids deep modifications of the pre-trained generative model, allowing MARBLE to preserve the geometric, textural, and illumination properties in images.

Numerical Results and Performance Analysis

Extensive qualitative and quantitative evaluations substantiate MARBLE's effectiveness in material blending and parametric control. MARBLE demonstrated superior performance against baselines such as InstructPix2Pix and Concept Sliders. Quantitative metrics, including PSNR and LPIPS, reveal its capability to deliver high-fidelity edits with robust disentanglement of material attributes. Furthermore, a user paper indicated a substantial preference for MARBLE results over other methods in real-world image scenarios.

Implications and Future Directions

MARBLE advances the prospects of material editing by providing a unified framework for controlling diverse image attributes without extensive tuning of base generative models. This research contributes to the practical toolkit available for vision applications in domains like graphic design, advertising, and game content creation. The ability of MARBLE to perform edits within various artistic styles underscores its relevance for creative industries.

Looking forward, the paper opens avenues for further exploration of CLIP-space for diverse editing tasks. The proposed methodology encourages future research into low-level feature manipulation within pre-trained models, potentially refining parametric control across broader image categories. As generative models continue evolving, MARBLE's integration with these advancements could enhance material editing capabilities, ensuring adaptability in evolving digital content landscapes.