Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework (2408.08054v1)

Published 15 Aug 2024 in cs.AI, cs.CL, and cs.SE

Abstract: The conventional BIM authoring process typically requires designers to master complex and tedious modeling commands in order to materialize their design intentions within BIM authoring tools. This additional cognitive burden complicates the design process and hinders the adoption of BIM and model-based design in the AEC (Architecture, Engineering, and Construction) industry. To facilitate the expression of design intentions more intuitively, we propose Text2BIM, an LLM-based multi-agent framework that can generate 3D building models from natural language instructions. This framework orchestrates multiple LLM agents to collaborate and reason, transforming textual user input into imperative code that invokes the BIM authoring tool's APIs, thereby generating editable BIM models with internal layouts, external envelopes, and semantic information directly in the software. Furthermore, a rule-based model checker is introduced into the agentic workflow, utilizing predefined domain knowledge to guide the LLM agents in resolving issues within the generated models and iteratively improving model quality. Extensive experiments were conducted to compare and analyze the performance of three different LLMs under the proposed framework. The evaluation results demonstrate that our approach can effectively generate high-quality, structurally rational building models that are aligned with the abstract concepts specified by user input. Finally, an interactive software prototype was developed to integrate the framework into the BIM authoring software Vectorworks, showcasing the potential of modeling by chatting.

Summary

The paper introduces a novel multi-agent framework that leverages LLMs to transform natural language into detailed BIM models with rule compliance rates nearing 99%.
The paper outlines a systematic methodology where specialized LLM roles—Product Owner, Architect, Programmer, and Reviewer—collectively refine design requirements and code generation.
The paper demonstrates significant potential for streamlining architectural design processes, while also highlighting challenges in handling complex spatial layouts for future research.

Text2BIM: Generating Building Models Using a LLM-based Multi-Agent Framework

The paper presented by Du et al. introduces Text2BIM, a framework leveraging LLMs to facilitate Building Information Modeling (BIM) by converting natural language inputs into 3D building models. The framework addresses the cognitive burden imposed on designers by complex BIM authoring tools and aims to transform the architectural design process through intuitive model generation.

Methodology

Text2BIM orchestrates multiple LLM agents with distinct roles to process and expand user input, generate architectural plans, write imperative code, and ensure the resulting models align with architectural standards. The agents in the framework include:

Product Owner: Enhances the original user input to create detailed requirements.
Architect: Develops building plans based on defined architectural rules.
Programmer: Translates requirements into executable code using a specified toolset.
Reviewer: Ensures code and model quality through iterative optimization and feedback loops.

The framework encapsulates BIM software's API functions into higher-level tool functions, simplifying the interaction between LLMs and the modeling environment. The generated code governs the creation of native BIM models, encompassing internal layouts and semantic data that can be further edited in authoring software.

A rule-based model checker evaluates the generated models against domain-specific standards, prompting agents to optimize their outputs iteratively. This approach not only enhances model quality but also incorporates domain expertise into the design process.

Results and Evaluation

The framework was tested with prompts covering various architectural scenarios using LLMs such as GPT-4o, Gemini-1.5-Pro, and Mistral-Large-2. Overall, the models generated exhibited high quality, with impressive average rule-pass rates of 99.4% and 99.2% for GPT-4o and Mistral-Large-2, respectively. Gemini's results were more variable, indicating room for improvement in robustness.

The paper also highlighted the importance of the framework’s quality optimization loop, emphasizing its effectiveness in reducing the issue amount iteratively. However, challenges remain in handling complex architectural layouts and ensuring spatial sensibility.

Implications and Future Directions

This framework offers significant practical advantages in the AEC industry by streamlining the transition from design intention to BIM model. It has the potential to expand beyond early-stage models to encompass detailed engineering designs. Future research could focus on integrating advanced spatial understanding into LLMs, improving automatic conflict resolution, and enhancing user interaction.

In conclusion, Text2BIM represents a notable advance in leveraging artificial intelligence to simplify complex architectural design processes. The fusion of LLMs in BIM authoring signifies a step forward in making advanced design tools more accessible and efficient, paving the way for further innovations in automated architectural modeling.

PDF Markdown