Papers
Topics
Authors
Recent
2000 character limit reached

HY3D-Bench: 3D Asset Generation Ecosystem

Updated 10 February 2026
  • HY3D-Bench is an open-source ecosystem for 3D asset generation, offering a unified, rigorously filtered dataset with training-ready meshes and part-level annotations.
  • It employs a robust data curation pipeline that standardizes mesh formats, ensures watertightness, and integrates multi-view rendering for enhanced model fidelity.
  • The benchmark supports scalable synthetic augmentation and provides standardized splits, metrics, and baseline checkpoints to drive research in 3D vision, robotics, and digital content creation.

HY3D-Bench is an open-source ecosystem and dataset for 3D asset generation, designed to address the persistent bottlenecks in large-scale, high-quality 3D data curation, annotation, and synthesis. It provides a unified, rigorously filtered, and richly structured corpus of training-ready 3D models, supporting both holistic and part-level applications in machine perception, generative modeling, and simulation contexts. HY3D-Bench standardizes 3D data preparation, introduces scalable synthetic augmentation for long-tail categories, and supplies detailed benchmarks and baseline resources to catalyze progress across research in 3D vision, robotics, and digital content creation (Hunyuan3D et al., 3 Feb 2026).

1. Motivation and Scope

The development of ultra-large 3D repositories, such as Objaverse and Objaverse-XL, has fueled advances in 3D generative modeling. However, these raw repositories are hampered by frequent non-manifold geometries, inconsistent coordinate conventions, missing or invalid textures, and absence of semantically meaningful part annotations. Such deficiencies elevate the preprocessing burden, require compute-intensive cleanup, and fragment evaluation protocols. HY3D-Bench was created to address these limitations by delivering:

  • 252,676 high-fidelity, training-ready meshes with multi-view renderings
  • 240,524 models with structured part-level decompositions
  • 125,312 AIGC-synthesized assets for long-tail semantic coverage
  • Standardized splits (train/val/test), metrics, and model checkpoints to promote reproducibility

The scope of HY3D-Bench encompasses both real-world and synthetic 3D assets, enabling research in controllable asset generation, perception, and interactive editing, with direct applications to robotics simulators, AR/VR content pipelines, and 3D vision model pre-training.

2. Data Curation and Processing Pipeline

2.1 Data Collection and Initial Filtering

HY3D-Bench sources its initial pool of approximately 10 million 3D models from Objaverse and Objaverse-XL. Asset selection relies on a multi-stage filter:

  • Geometric complexity: Minimum polygon count and requirement for non-degenerate topology
  • Texture integrity: No UV overlaps, no missing maps, and resolution above a defined threshold
  • Thin-structure exclusion: Removal of models with large thin shells to prevent SDF instability and view occlusion

Filtered assets are finalized into 252,676 models, partitioned into 252,000 for training, 276 for validation, and 400 for testing.

2.2 Mesh Standardization and Watertightness Enforcement

A rigorous pipeline ensures all meshes are single-mesh PLY files, oriented Z-up and right-handed. The watertight mesh protocol includes:

  1. Format and orientation standardization
  2. Multi-view rendering (pre-cleanup): Using Blender, with 24–32 orthographic and perspective views per object, typically at 512×512 px
  3. Watertightness post-processing:
    • Compute unsigned distance field on a 512³ grid
    • Extract thin-shell level set via Marching Cubes (ϵ=1/512\epsilon = 1/512)
    • Reconstruct tetrahedral volume with Delaunay triangulation
    • Inner/outer labeling using graph-cut optimization
    • Final watertight surface mesh extraction
  4. Point-cloud sampling: Hybrid uniform and edge-weighted strategy per Dora [Chen et al. 2025]
  5. Automatic quality checks: Manifoldness, self-intersections, and isolated vertices

2.3 Multi-View Rendering Protocol

All assets are rendered with neutral three-point studio lighting. Orthographic projection preserves scale, and perspective captures depth cues. Image outputs per object include:

  • RGB textures with PBR materials
  • Silhouette masks
  • Typical resolution: 512² or 1024² px (category-dependent)

3. Structured Part-Level Decomposition

3.1 Semantic Part Representation

Each object OO is decomposed into parts {P1,…,Pm}\{P_1, \ldots, P_m\}, structured either as a graph G=(V,E)G = (V, E) (with V=V = parts, E=E = adjacency) or as a hierarchical tree for nested parts. Part-level data is stored as separate watertight mesh files and global part-ID masks.

3.2 Automated Part Segmentation

The core segmentation algorithm applies:

  • Connected component analysis: Initial partition by topological independence
  • Extremity filtering: Removal if >888 or <2 components
  • Area-based merging: Small fragments are merged to avoid spurious parts
  • Count control: Typical part count per object is 10–40

3.3 Semantic Filtering

To ensure meaningful decompositions:

  • Assets with a single part covering >85% surface are excluded
  • Models with excessive small isolated parts are filtered
  • The final result is 240,524 part-aware models (mean 14.13 parts, median 11)

3.4 Use Cases

RGB and part-ID mask images support conditional editing, such as recoloring or structural manipulation for simulation (e.g., swapping gripper finger orientation in a robotic gripper).

4. AIGC-Based Synthetic Data Generation

4.1 Synthesis Architecture

A scalable AIGC pipeline augments long-tail category coverage via:

  1. Text expansion: LLM-driven prompt generation over 1,252 fine-grained categories, optimizing for plausibility and visual diversity
  2. Image generation: LoRA-tuned Qwen-Image yields clean, centered object images from descriptions
  3. 3D reconstruction: HY3D-3.0 (hybrid explicit–implicit diffusion backbone) converts images to 3D assets with high PBR fidelity

4.2 Training and Post-Processing

  • Qwen-Image LoRA utilizes standard CLIP plus adversarial losses, LoRA rank-8
  • HY3D-3.0 is pre-trained on real HY3D data, with score-distillation fine-tuning
  • Post-processing removes non-watertight and low-quality items

4.3 Coverage and Diversity

The synthetic asset pool consists of 125,312 models, ensuring at least 100 samples per real category. Categories are hierarchically organized as 20 super-categories, 130 mid-level, and 1,252 fine-grained types.

5. Benchmarking, Evaluation, and Usage

5.1 Benchmark Resources

HY3D-Bench defines training/validation/test splits, standardized evaluation metrics (Uni3D, ULIP), and releases baseline model checkpoints.

5.2 Model Training and Metrics

Empirical validation is demonstrated through Hunyuan3D-2.1-Small, with model modifications:

  • Channel dimension reduction from 2,048 to 1,536
  • Removal of MoE in favor of a fully dense design
  • Parameter count: 832M
  • Progressive resolution training schedule

The main training loss adopts the flow-matching diffusion loss:

L=Et,x0,x1,c∥vθ(x,t,c)−(x1−x0)∥22\mathcal{L} = \mathbb{E}_{t,x_0,x_1,c}\bigl\|v_\theta(x,t,c) - (x_1-x_0)\bigr\|_2^2

Fidelity metrics (Uni3D-I, ULIP-I) are reported as follows:

Methods Uni3D-I ULIP-I
Michelangelo 0.3169 0.2186
CraftsMan 0.3351 0.2264
Trellis 0.3641 0.2454
Hunyuan3D-2.1 0.3636 0.2446
Ours (Small) 0.3606 0.2424

5.3 Downstream Applications

HY3D-Bench data underpins a range of downstream uses:

  • Robotics (e.g., grasping simulators employing accurate CAD geometry and per-part collision modeling)
  • AR/VR workflows (enabling high-resolution PBR texture application and staged LOD generation)
  • Pre-training of 3D perception models (point-cloud encoding and segmentation)
  • Part-aware generation facilitates interactive editing and configurable 3D asset manipulation

6. Impact and Future Directions

By providing a unified, end-to-end ecosystem with rigorous data curation, semantically structured decompositions, and comprehensive synthetic augmentation, HY3D-Bench lowers entry barriers for 3D research. The introduction of robust splits, transparent metrics, and strong synthetic coverage addresses limitations in reproducibility, data fragmentation, and the ability to explore long-tail categories or fine-grained object properties.

A plausible implication is that, as 3D generative models and vision systems become increasingly reliant on clean, well-annotated, and diverse training distributions, frameworks such as HY3D-Bench will become reference corpora for both benchmarking and large-scale pre-training in academic and industrial contexts (Hunyuan3D et al., 3 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HY3D-Bench.