Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 75 tok/s

Gemini 2.5 Pro 55 tok/s Pro

GPT-5 Medium 22 tok/s Pro

GPT-5 High 20 tok/s Pro

GPT-4o 113 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 459 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior (2503.12834v1)

Published 17 Mar 2025 in cs.CV and cs.AI

Abstract: A fundamental challenge in conditional 3D shape generation is to minimize the information loss and maximize the intention of user input. Existing approaches have predominantly focused on two types of isolated conditional signals, i.e., user sketches and text descriptions, each of which does not offer flexible control of the generated shape. In this paper, we introduce PASTA, the flexible approach that seamlessly integrates a user sketch and a text description for 3D shape generation. The key idea is to use text embeddings from a vision-LLM to enrich the semantic representation of sketches. Specifically, these text-derived priors specify the part components of the object, compensating for missing visual cues from ambiguous sketches. In addition, we introduce ISG-Net which employs two types of graph convolutional networks: IndivGCN, which processes fine-grained details, and PartGCN, which aggregates these details into parts and refines the structure of objects. Extensive experiments demonstrate that PASTA outperforms existing methods in part-level editing and achieves state-of-the-art results in sketch-to-3D shape generation.

Summary

Overview of PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior

The research paper introduces PASTA, a novel framework addressing the limitations in existing sketch-based 3D shape generation methodologies. This work specifically targets the integration of user sketches and text descriptions to enhance the generation fidelity and user control over the resulting 3D models. Unlike isolated approaches that rely solely on sketches or text, PASTA leverages text embeddings to enrich the semantic representation of sketches, thereby overcoming their inherent ambiguities.

The core innovation in this work is the introduction of ISG-Net, comprising two distinct Graph Convolutional Network (GCN) architectures: IndivGCN and PartGCN. Through this design, PASTA effectively processes fine-grained details and enhances structural consistency across object parts, a significant improvement over previous methodologies that struggle with part-level coherence.

Methodological Advancements and Results

PASTA's methodology is grounded in the combination of sketch analysis and semantics derived from text descriptions using a Vision-LLM (VLM). This dual-modality approach resolves the lack of semantic depth traditionally present in sketch-only methods. The IndivGCN and PartGCN work collaboratively within ISG-Net to refine detail and aggregate part-level feature information, respectively, thus ensuring the structural integrity of generated 3D models.

A key technical element is PASTA's reliance on Gaussian mixture models (GMMs) for representing 3D shapes. This choice facilitates part-wise editing capabilities, allowing for dynamic interactions akin to what advanced users might expect in contemporary design and virtual reality applications.

Quantitatively, PASTA outperforms existing sketch-to-3D generation methods, as evidenced by lower Chamfer Distance (CD), Earth Mover’s Distance (EMD), and Fréchet Inception Distance (FID) scores across various datasets, including AmateurSketch-3D and ProSketch-3D. These metrics objectively demonstrate PASTA's enhanced capacity to generate more precise, detailed, and realistic 3D models in comparison to other state-of-the-art methods.

Implications and Future Directions

From a practical standpoint, PASTA holds significant potential for applications in the fields of design, gaming, and augmented/virtual reality. Its intuitive editing capabilities and fidelity in capturing design intent from sketches can revolutionize workflows in these industries. The part-level editing feature is particularly promising for designers requiring iterative and granular control over their creations.

Theoretically, this research contributes to the dialogue on multimodal integration in AI, showcasing how nuanced text conditions can effectively compensate for ambiguities in visual data. This opens potential avenues for future research, particularly in enhancing the integration of textual semantic information with other forms of conditional signals.

Moving forward, further development could involve extending PASTA's applicability to a broader spectrum of shape categories and supporting more complex topological structures. Additionally, there is an opportunity to explore scalability in real-world image applications, as indicated by experiments involving ControlNet for realistic image conversion, enhancing the model's domain adaptability.

In summary, PASTA presents a significant advancement in the field of sketch-based 3D shape generation. By synergizing text conditions with sketches and employing a sophisticated GCN-based framework, it sets a new benchmark for precision and usability in generating 3D models from abstract input.