Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation (2504.18509v1)

Published 25 Apr 2025 in cs.CV

Abstract: Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal LLMs for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models.

Summary

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

The paper "Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation" introduces an innovative framework for evaluating the quality of 3D assets generated by contemporary algorithms. Despite advancements in 3D generation technologies, current systems still fall short in producing assets that are fully geometrically coherent and semantically consistent across multiple viewpoints. Addressing this gap, the authors propose Eval3D, a toolkit designed to provide a nuanced and interpretable assessment of 3D generation quality, leveraging a multi-dimensional approach.

Core Contributions

Eval3D distinguishes itself by assessing a 3D object from varied perspectives rather than relying solely on surface-level attributes. Traditional metrics often overlook intricate geometric details and rely on black-box evaluations from LLMs, which may lead to coarse assessments. To achieve a refined evaluation, Eval3D integrates a mesh of diverse models and tools, each acting as a probe to detect inconsistencies.

Key Metrics:

Geometric Consistency: This metric evaluates the alignment between analytically derived normals from 3D meshes and those predicted by computer vision models from 2D images. A discrepancy indicates potential flaws in the texture-geometry relationship.
Semantic Consistency: Drawing on vision foundation models like DINO, this metric measures cross-view consistency by evaluating feature variability. High variance suggests semantic inconsistency.
Structural Consistency: Framed as a novel view synthesis problem, structural consistency employs models like Stable-Zero123 to predict alternative viewpoints. Discrepancies between predicted and actual views signal structural implausibility.
Text-3D Alignment: This metric assesses the fidelity of the generated asset to its text prompt through multimodal LLMs (Vision-QA models like LLaVA), ensuring that the semantic content adheres to initial conditions.
Aesthetic: Aesthetic quality is gauged through models like ImageReward and GPT-4o, focusing on visual appeal across different angles.

Results and Implications

The authors demonstrate Eval3D’s capability by comprehensively evaluating state-of-the-art 3D generation algorithms such as DreamFusion, Magic3D, and MVDream, offering insights into their limitations and strengths. Notably, Eval3D outperforms existing evaluation tools in aligning with human judgment across all defined metrics, delivering both quantitative measures and qualitative insights into potential artefacts like multi-face Janus issues.

Theoretical and Practical Impact

Eval3D has significant implications for both the theoretical development of 3D generative models and practical applications in fields such as gaming, film, and augmented reality. By providing fine-grained, interpretable metrics that closely mirror human evaluation, Eval3D improves our understanding of 3D generation quality, facilitating the refinement and optimization of current methods. It underscores the importance of aligning generated assets not only with aesthetic standards but also with semantic and structural integrity.

Looking forward, as the foundational models improve, we can expect Eval3D to offer even greater accuracy and alignment with human assessments. Integrating Eval3D into standard evaluation practices could drive advancements in 3D generative technologies, influencing industries dependent on high-quality 3D assets.

In summary, the paper presents a robust toolset for addressing the nuanced challenges of 3D generation evaluation, paving the way for enhanced model evaluations that combine geometrical precision, semantic coherence, and structural plausibility.

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation (2504.18509v1)

Summary