HY3D-Bench: 3D Asset Generation Ecosystem
- HY3D-Bench is an open-source ecosystem for 3D asset generation, offering a unified, rigorously filtered dataset with training-ready meshes and part-level annotations.
- It employs a robust data curation pipeline that standardizes mesh formats, ensures watertightness, and integrates multi-view rendering for enhanced model fidelity.
- The benchmark supports scalable synthetic augmentation and provides standardized splits, metrics, and baseline checkpoints to drive research in 3D vision, robotics, and digital content creation.
HY3D-Bench is an open-source ecosystem and dataset for 3D asset generation, designed to address the persistent bottlenecks in large-scale, high-quality 3D data curation, annotation, and synthesis. It provides a unified, rigorously filtered, and richly structured corpus of training-ready 3D models, supporting both holistic and part-level applications in machine perception, generative modeling, and simulation contexts. HY3D-Bench standardizes 3D data preparation, introduces scalable synthetic augmentation for long-tail categories, and supplies detailed benchmarks and baseline resources to catalyze progress across research in 3D vision, robotics, and digital content creation (Hunyuan3D et al., 3 Feb 2026).
1. Motivation and Scope
The development of ultra-large 3D repositories, such as Objaverse and Objaverse-XL, has fueled advances in 3D generative modeling. However, these raw repositories are hampered by frequent non-manifold geometries, inconsistent coordinate conventions, missing or invalid textures, and absence of semantically meaningful part annotations. Such deficiencies elevate the preprocessing burden, require compute-intensive cleanup, and fragment evaluation protocols. HY3D-Bench was created to address these limitations by delivering:
- 252,676 high-fidelity, training-ready meshes with multi-view renderings
- 240,524 models with structured part-level decompositions
- 125,312 AIGC-synthesized assets for long-tail semantic coverage
- Standardized splits (train/val/test), metrics, and model checkpoints to promote reproducibility
The scope of HY3D-Bench encompasses both real-world and synthetic 3D assets, enabling research in controllable asset generation, perception, and interactive editing, with direct applications to robotics simulators, AR/VR content pipelines, and 3D vision model pre-training.
2. Data Curation and Processing Pipeline
2.1 Data Collection and Initial Filtering
HY3D-Bench sources its initial pool of approximately 10 million 3D models from Objaverse and Objaverse-XL. Asset selection relies on a multi-stage filter:
- Geometric complexity: Minimum polygon count and requirement for non-degenerate topology
- Texture integrity: No UV overlaps, no missing maps, and resolution above a defined threshold
- Thin-structure exclusion: Removal of models with large thin shells to prevent SDF instability and view occlusion
Filtered assets are finalized into 252,676 models, partitioned into 252,000 for training, 276 for validation, and 400 for testing.
2.2 Mesh Standardization and Watertightness Enforcement
A rigorous pipeline ensures all meshes are single-mesh PLY files, oriented Z-up and right-handed. The watertight mesh protocol includes:
- Format and orientation standardization
- Multi-view rendering (pre-cleanup): Using Blender, with 24–32 orthographic and perspective views per object, typically at 512×512 px
- Watertightness post-processing:
- Compute unsigned distance field on a 512³ grid
- Extract thin-shell level set via Marching Cubes ()
- Reconstruct tetrahedral volume with Delaunay triangulation
- Inner/outer labeling using graph-cut optimization
- Final watertight surface mesh extraction
- Point-cloud sampling: Hybrid uniform and edge-weighted strategy per Dora [Chen et al. 2025]
- Automatic quality checks: Manifoldness, self-intersections, and isolated vertices
2.3 Multi-View Rendering Protocol
All assets are rendered with neutral three-point studio lighting. Orthographic projection preserves scale, and perspective captures depth cues. Image outputs per object include:
- RGB textures with PBR materials
- Silhouette masks
- Typical resolution: 512² or 1024² px (category-dependent)
3. Structured Part-Level Decomposition
3.1 Semantic Part Representation
Each object is decomposed into parts , structured either as a graph (with parts, adjacency) or as a hierarchical tree for nested parts. Part-level data is stored as separate watertight mesh files and global part-ID masks.
3.2 Automated Part Segmentation
The core segmentation algorithm applies:
- Connected component analysis: Initial partition by topological independence
- Extremity filtering: Removal if >888 or <2 components
- Area-based merging: Small fragments are merged to avoid spurious parts
- Count control: Typical part count per object is 10–40
3.3 Semantic Filtering
To ensure meaningful decompositions:
- Assets with a single part covering >85% surface are excluded
- Models with excessive small isolated parts are filtered
- The final result is 240,524 part-aware models (mean 14.13 parts, median 11)
3.4 Use Cases
RGB and part-ID mask images support conditional editing, such as recoloring or structural manipulation for simulation (e.g., swapping gripper finger orientation in a robotic gripper).
4. AIGC-Based Synthetic Data Generation
4.1 Synthesis Architecture
A scalable AIGC pipeline augments long-tail category coverage via:
- Text expansion: LLM-driven prompt generation over 1,252 fine-grained categories, optimizing for plausibility and visual diversity
- Image generation: LoRA-tuned Qwen-Image yields clean, centered object images from descriptions
- 3D reconstruction: HY3D-3.0 (hybrid explicit–implicit diffusion backbone) converts images to 3D assets with high PBR fidelity
4.2 Training and Post-Processing
- Qwen-Image LoRA utilizes standard CLIP plus adversarial losses, LoRA rank-8
- HY3D-3.0 is pre-trained on real HY3D data, with score-distillation fine-tuning
- Post-processing removes non-watertight and low-quality items
4.3 Coverage and Diversity
The synthetic asset pool consists of 125,312 models, ensuring at least 100 samples per real category. Categories are hierarchically organized as 20 super-categories, 130 mid-level, and 1,252 fine-grained types.
5. Benchmarking, Evaluation, and Usage
5.1 Benchmark Resources
HY3D-Bench defines training/validation/test splits, standardized evaluation metrics (Uni3D, ULIP), and releases baseline model checkpoints.
5.2 Model Training and Metrics
Empirical validation is demonstrated through Hunyuan3D-2.1-Small, with model modifications:
- Channel dimension reduction from 2,048 to 1,536
- Removal of MoE in favor of a fully dense design
- Parameter count: 832M
- Progressive resolution training schedule
The main training loss adopts the flow-matching diffusion loss:
Fidelity metrics (Uni3D-I, ULIP-I) are reported as follows:
| Methods | Uni3D-I | ULIP-I |
|---|---|---|
| Michelangelo | 0.3169 | 0.2186 |
| CraftsMan | 0.3351 | 0.2264 |
| Trellis | 0.3641 | 0.2454 |
| Hunyuan3D-2.1 | 0.3636 | 0.2446 |
| Ours (Small) | 0.3606 | 0.2424 |
5.3 Downstream Applications
HY3D-Bench data underpins a range of downstream uses:
- Robotics (e.g., grasping simulators employing accurate CAD geometry and per-part collision modeling)
- AR/VR workflows (enabling high-resolution PBR texture application and staged LOD generation)
- Pre-training of 3D perception models (point-cloud encoding and segmentation)
- Part-aware generation facilitates interactive editing and configurable 3D asset manipulation
6. Impact and Future Directions
By providing a unified, end-to-end ecosystem with rigorous data curation, semantically structured decompositions, and comprehensive synthetic augmentation, HY3D-Bench lowers entry barriers for 3D research. The introduction of robust splits, transparent metrics, and strong synthetic coverage addresses limitations in reproducibility, data fragmentation, and the ability to explore long-tail categories or fine-grained object properties.
A plausible implication is that, as 3D generative models and vision systems become increasingly reliant on clean, well-annotated, and diverse training distributions, frameworks such as HY3D-Bench will become reference corpora for both benchmarking and large-scale pre-training in academic and industrial contexts (Hunyuan3D et al., 3 Feb 2026).