Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 75 tok/s

Gemini 2.5 Pro 55 tok/s Pro

GPT-5 Medium 22 tok/s Pro

GPT-5 High 20 tok/s Pro

GPT-4o 113 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 459 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

DreamArt: Generating Interactable Articulated Objects from a Single Image (2507.05763v1)

Published 8 Jul 2025 in cs.CV

Abstract: Generating articulated objects, such as laptops and microwaves, is a crucial yet challenging task with extensive applications in Embodied AI and AR/VR. Current image-to-3D methods primarily focus on surface geometry and texture, neglecting part decomposition and articulation modeling. Meanwhile, neural reconstruction approaches (e.g., NeRF or Gaussian Splatting) rely on dense multi-view or interaction data, limiting their scalability. In this paper, we introduce DreamArt, a novel framework for generating high-fidelity, interactable articulated assets from single-view images. DreamArt employs a three-stage pipeline: firstly, it reconstructs part-segmented and complete 3D object meshes through a combination of image-to-3D generation, mask-prompted 3D segmentation, and part amodal completion. Second, we fine-tune a video diffusion model to capture part-level articulation priors, leveraging movable part masks as prompt and amodal images to mitigate ambiguities caused by occlusion. Finally, DreamArt optimizes the articulation motion, represented by a dual quaternion, and conducts global texture refinement and repainting to ensure coherent, high-quality textures across all parts. Experimental results demonstrate that DreamArt effectively generates high-quality articulated objects, possessing accurate part shape, high appearance fidelity, and plausible articulation, thereby providing a scalable solution for articulated asset generation. Our project page is available at https://dream-art-0.github.io/DreamArt/.

Summary

The paper introduces a three-stage pipeline that reconstructs segmented 3D meshes, synthesizes articulation videos, and optimizes joint parameters from a single image.
It employs mask-guided segmentation and amodal completion techniques to overcome occlusion issues and achieve detailed part decomposition.
The model demonstrates superior performance for AR/VR and embodied AI applications by producing physically plausible, interactable assets.

DreamArt: Generating Interactable Articulated Objects from a Single Image

Introduction

The paper introduces a novel framework, DreamArt, which focuses on generating interactable articulated 3D assets from single-view images. This task, crucial for applications in Embodied AI and AR/VR, involves challenges such as recovering precise part structures and accurately modeling articulation patterns. Existing image-to-3D methods generally overlook part decomposition and articulation, while neural reconstruction methods require extensive data, limiting scalability.

DreamArt employs a three-stage pipeline to overcome these issues: reconstruction of segmented 3D meshes, synthesis of articulation videos using amodal images and masks, and optimization of joint parameters for realistic articulations.

Methodology

Part-Aware 3D Object Generation

The initial stage involves generating a complete, segmented mesh from a single image, utilizing advanced image-to-3D models combined with mask-prompted segmentation. This process derives complete movable and base part meshes, along with amodal images that provide visibility into occluded parts.

Mask-Guided 3D Segmentation: This step integrates clustering techniques with segmentation masks to overcome granularity ambiguities and occlusions, refining the mesh segmentation into distinct movable and base parts.

Part Amodal Completion: Techniques such as HoloPart are employed to complete the geometrically incomplete segmentations, with inpainting models used to generate amodal images for occluded areas.

Figure 1: Method Overview. Our three-stage pipeline first reconstructs complete, segmented part meshes from a single image. Next, it synthesizes plausible articulation videos using amodal images and part masks as prompts. Finally, it optimizes joint parameters and refines texture maps for enhanced realism.

Articulation Video Synthesis

A key component of DreamArt is its approach to learning articulation patterns. By fine-tuning a video diffusion model, it synthesizes videos depicting articulation, utilizing movable part masks and amodal images as prompts to guide plausible movement synthesis. This strategy mitigates the limitations of manually defined motion cues and addresses occlusions effectively.

Figure 2: Qualitative comparison of synthesized articulation videos. We present qualitative results on both in-domain and in-the-wild data. Our method consistently outperforms the baselines by producing clearer and more plausible articulation, particularly in multi-part object scenarios.

Joint Parameter Optimization and Texture Refinement

Following video synthesis, joint parameters are optimized using a dual quaternion representation for articulated movements. The model estimates joint types and motions, applying differentiable rendering techniques to ensure precise and plausible articulation. Texture refinement ensures that the appearance is coherent and realistic across parts.

Experimental Evaluation

The method's performance was evaluated through various benchmarks and user studies, showcasing superior quality in articulated asset generation. DreamArt's ability to generalize across object categories and produce physically plausible, articulated assets highlighted its efficacy compared to baseline methods.

Figure 3: Visualizations on asset synthesis. shows clearer images with more plausible articulations than baselines, especially under novel views.

Figure 4: PartRM performs well on in-domain data but generalizes poorly to in-the-wild data.

Figure 5: Multi-view renderings of articulations.

Conclusion

DreamArt presents a scalable solution for generating high-quality articulated objects from single images, addressing both part-level segmentation and articulation modeling. While leveraging existing image-to-3D models presents challenges such as occasionally producing implausible results, DreamArt's multi-view adaptability and physically coherent model setups mark significant progress in articulated asset generation. Future research could focus on improving viewpoint ambiguity resolutions and extending articulation optimization to multi-view contexts.