LL3M: Large Language 3D Modelers (2508.08228v1)

Published 11 Aug 2025 in cs.GR and cs.AI

Abstract: We present LL3M, a multi-agent system that leverages pretrained LLMs to generate 3D assets by writing interpretable Python code in Blender. We break away from the typical generative approach that learns from a collection of 3D data. Instead, we reformulate shape generation as a code-writing task, enabling greater modularity, editability, and integration with artist workflows. Given a text prompt, LL3M coordinates a team of specialized LLM agents to plan, retrieve, write, debug, and refine Blender scripts that generate and edit geometry and appearance. The generated code works as a high-level, interpretable, human-readable, well-documented representation of scenes and objects, making full use of sophisticated Blender constructs (e.g. B-meshes, geometry modifiers, shader nodes) for diverse, unconstrained shapes, materials, and scenes. This code presents many avenues for further agent and human editing and experimentation via code tweaks or procedural parameters. This medium naturally enables a co-creative loop in our system: agents can automatically self-critique using code and visuals, while iterative user instructions provide an intuitive way to refine assets. A shared code context across agents enables awareness of previous attempts, and a retrieval-augmented generation knowledge base built from Blender API documentation, BlenderRAG, equips agents with examples, types, and functions empowering advanced modeling operations and code correctness. We demonstrate the effectiveness of LL3M across diverse shape categories, style and material edits, and user-driven refinements. Our experiments showcase the power of code as a generative and interpretable medium for 3D asset creation. Our project page is at https://threedle.github.io/ll3m.

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper introduces a novel multi-agent framework that generates interpretable, modular Blender code for 3D asset creation across diverse shape categories.
It leverages BlenderRAG for retrieval-augmented generation, achieving a 5× increase in complex operations and a 26% reduction in error rates.
The system enables iterative, user-guided asset refinement with automatic visual critique and verification to maintain high fidelity to user prompts.

LL3M: Large Language 3D Modelers — A Multi-Agent LLM System for Interpretable 3D Asset Generation

Introduction and Motivation

LL3M introduces a paradigm shift in 3D asset generation by leveraging a multi-agent system of pretrained LLMs to write interpretable Python code for Blender, rather than relying on traditional representation-centric generative models or direct mesh token prediction. The system reformulates shape generation as a code-writing task, enabling modularity, editability, and seamless integration with artist workflows. This approach circumvents the need for large 3D datasets or finetuning, instead utilizing the abstraction and reusability inherent in code to generalize across diverse shape categories and styles.

Figure 1: Reusable code structure enables generalization across visually distinct shapes via shared high-level code patterns.

System Architecture and Workflow

LL3M is architected as a multi-agent framework, orchestrated by an external controller, with six specialized agents: planner, retrieval, coding, critic, verification, and user feedback. The workflow is divided into three phases:

Initial Creation: The planner agent decomposes the user prompt into subtasks, the retrieval agent accesses BlenderRAG (a RAG database of Blender API documentation), and the coding agent writes and executes modular Blender code for each subtask.
Auto-Refinement: The critic agent, powered by a VLM, analyzes multi-view renders of the asset, identifies discrepancies, and proposes targeted improvements. The coding agent applies these fixes, and the verification agent ensures their correct implementation.
User-Guided Refinement: Users can iteratively provide high-level or fine-grained instructions, which are processed by the coding and verification agents to update the asset without full regeneration.
Figure 2: Method overview showing the multi-agent pipeline for initial creation, automatic refinement, and user-guided refinement.

Figure 3: Intermediate results for each phase, demonstrating iterative mesh refinement and targeted edits.

Key Technical Innovations

Code as a Generative Medium

LL3M's central innovation is representing 3D assets as modular, human-readable Blender code. This enables:

Generalization: Shared code patterns (loops, modifiers, node setups) transfer across categories.
Editability: Users and agents can make localized edits via code or procedural parameters.
Interpretable Control: Generated code is well-documented, with clear variable names and structured logic.
Figure 4: Interpretable code with descriptive comments and modular structure, facilitating direct user edits.

Retrieval-Augmented Generation (BlenderRAG)

BlenderRAG injects version-specific Blender API documentation into the agent workflow, enabling the coding agent to utilize advanced Blender constructs (B-meshes, geometry nodes, shader nodes) and reducing execution errors. Quantitative analysis shows a $5\times$ increase in complex operations and a $26\%$ reduction in error rate with BlenderRAG.

Figure 5: BlenderRAG enables the use of complex Blender functions, improving mesh quality and code sophistication.

Agents share a global context, allowing for coherent coordination and preservation of asset state across phases. The orchestrator manages agent order and communication, ensuring that edits are incremental and targeted rather than wholesale regenerations.

Figure 6: Context sharing across agents preserves and improves assets through iterative refinement.

Visual Critique and Verification

The critic and verification agents leverage VLMs to provide automatic visual feedback and ensure that code modifications align with user intent and prompt specifications. This loop enables high fidelity to input and robust error correction.

Figure 7: Verification agent ensures that all critiques are addressed, requiring multiple iterations if necessary.

Empirical Results and Evaluation

Generality and Versatility

LL3M demonstrates the ability to generate assets across a wide range of categories (vehicles, instruments, animals, scenes) with detailed geometry, textures, materials, and hierarchical relationships.

Figure 8: Gallery of results showcasing diverse, detailed, and editable 3D assets generated via Blender code.

Fidelity and Iterative Creation

The system exhibits high fidelity to user prompts, with the auto-refinement phase correcting implausible configurations and the user-guided phase enabling precise, iterative edits. Approximately 59% of user edits are achieved with a single instruction; more complex spatial edits require 3–4 follow-up prompts.

Figure 9: Iterative creation allows for successive, targeted edits while preserving asset identity.

Material and Stylization Edits

LL3M supports both geometric and material modifications, as well as high-level stylization instructions, applied consistently across different initial meshes.

Figure 10: Material editing via procedural shader nodes enables localized appearance changes.

Figure 11: Consistent stylization across assets in response to high-level style prompts.

Scene Generation and Hierarchical Structure

The system can generate multi-object scenes with appropriate spatial relationships and logical parenting, supporting scene graph behavior in Blender.

Figure 12: Scene generation with instancing and parenting relationships for hierarchical structure.

Figure 13: Hierarchical scene graph with semantic naming and parent-child relationships.

Baseline Comparison and Ablation

Compared to BlenderMCP, LL3M produces higher-quality meshes with richer details and greater alignment to input prompts. Ablation studies confirm the necessity of each agent for optimal asset quality.

Figure 14: Baseline comparison showing superior mesh quality and detail in LL3M outputs.

Figure 15: Ablation paper demonstrating progressive improvement with the addition of each agent.

Implementation Details

Framework: Implemented in AutoGen, with agent orchestration and context sharing.
Blender Version: 4.4, with BlenderRAG constructed from 1,729 documentation files.
LLMs: GPT-4o (planner, retrieval), Claude 3.7 Sonnet (coding), Gemini 2.0 flash (critic, verification).
Runtime: Initial creation $\approx$ 4 min, auto-refinement $\approx$ 6 min, user edits $\approx$ 38 sec per instruction.

Limitations and Future Directions

LL3M relies on the accuracy of VLMs for visual feedback, which may miss subtle spatial artifacts. However, user-driven iterative refinement can correct residual errors. The system is adaptable to future Blender versions via updates to BlenderRAG. Potential future work includes integrating more advanced VLMs, expanding multi-modal input capabilities, and optimizing agent collaboration for reduced latency.

Figure 16: Limitations in automatic visual feedback can be overcome by user-supplied corrections.

Conclusion

LL3M establishes a robust framework for interpretable, modular 3D asset generation via multi-agent LLM collaboration. By representing assets as editable Blender code and leveraging retrieval-augmented generation, the system achieves generalization, fidelity, and user-driven co-creation without the constraints of category-specific models or low-level mesh representations. The approach bridges automation and creative control, offering a transparent and extensible solution for 3D modeling in graphics pipelines and artist workflows. Future developments may further enhance multi-modal capabilities, agent reasoning, and integration with evolving graphics software.

PDF Markdown

Follow-up Questions

Related Papers

Authors (6)

GitHub

LL3M: Large Language 3D Modelers
GitHub - threedle/ll3m (6 stars)

Tweets

https://twitter.com/RanaHanocka/status/1956157057356455968

https://twitter.com/_akhaliq/status/1956418357034660264

https://twitter.com/samim/status/1957076890356711778

https://twitter.com/eherrerosj/status/1957087963327729815

https://twitter.com/Ralph_Grabowski/status/1957110698301174213

https://twitter.com/3duaun/status/1957246569763508565

alphaXiv

LL3M: Large Language 3D Modelers (26 likes, 0 questions)