Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
112 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

Generating Physically Stable and Buildable Brick Structures from Text (2505.05469v2)

Published 8 May 2025 in cs.CV

Abstract: We introduce BrickGPT, the first approach for generating physically stable interconnecting brick assembly models from text prompts. To achieve this, we construct a large-scale, physically stable dataset of brick structures, along with their associated captions, and train an autoregressive LLM to predict the next brick to add via next-token prediction. To improve the stability of the resulting designs, we employ an efficient validity check and physics-aware rollback during autoregressive inference, which prunes infeasible token predictions using physics laws and assembly constraints. Our experiments show that BrickGPT produces stable, diverse, and aesthetically pleasing brick structures that align closely with the input text prompts. We also develop a text-based brick texturing method to generate colored and textured designs. We show that our designs can be assembled manually by humans and automatically by robotic arms. We release our new dataset, StableText2Brick, containing over 47,000 brick structures of over 28,000 unique 3D objects accompanied by detailed captions, along with our code and models at the project website: https://avalovelace1.github.io/BrickGPT/.

Summary

  • The paper introduces LegoGPT, a novel method that generates physically stable and buildable LEGO designs from text using next-brick prediction.
  • It employs the StableText2Lego dataset with over 47,000 stable structures paired with detailed captions for real-world assembly applications.
  • The approach integrates brick-by-brick rejection sampling and physics-aware rollback to validate and enhance design stability during inference.

This paper, "Generating Physically Stable and Buildable LEGO Designs from Text" (2505.05469), introduces LegoGPT, a novel method for creating physically stable and buildable LEGO structures directly from natural language text prompts. Unlike traditional 3D generative models that often produce designs difficult or impossible to realize physically, LegoGPT addresses the constraints inherent in real-world assembly using standard components like LEGO bricks.

The core idea behind LegoGPT is to repurpose autoregressive LLMs for the task of "next-brick prediction." The problem of generating a LEGO structure is framed as a sequence generation task, where each token in the sequence specifies a particular LEGO brick and its position.

Dataset: StableText2Lego

A key contribution is the creation of a large-scale dataset called StableText2Lego. Training an autoregressive model requires a substantial amount of data, and existing datasets lacked physically stable LEGO designs paired with detailed text descriptions. StableText2Lego contains over 47,000 stable LEGO structures, derived from over 28,000 unique 3D objects across 21 categories from the ShapeNetCore dataset.

The dataset construction process involves several steps:

  1. Mesh-to-LEGO Conversion: Starting from a 3D mesh (e.g., from ShapeNet), the mesh is voxelized into a fixed-size grid (20×20×2020 \times 20 \times 20 in this work). A split-and-remerge legolization algorithm is then applied to determine a brick layout that approximates the voxelized shape using a predefined set of standard LEGO bricks (1×11\times 1, 1×21\times 2, 1×41\times 4, 1×61\times6, 1×81\times 8, 2×22\times 2, 2×42\times 4, and 2×62\times 6).
  2. Structural Augmentation: To increase data diversity and the likelihood of obtaining stable structures, multiple different brick layouts are generated for the same 3D object by introducing randomness during the legolization process.
  3. Stability Analysis: Each generated LEGO structure undergoes a physical stability assessment using a method based on a structural force model. This analysis involves formulating and solving a nonlinear program to determine if the forces on each brick can reach static equilibrium without exceeding friction capacity. Structures where any brick has a stability score of 0 (indicating instability) are filtered out.
  4. Caption Generation: For each stable LEGO structure, the paper generates multi-view renderings. These renderings are then provided to a large multimodal model (GPT-4o) with a specific prompt engineered to produce detailed, geometry-focused text descriptions of the LEGO model, omitting color or function information.

Each entry in the dataset pairs a stable LEGO structure (represented as a sequence of bricks in a custom text format) with a text caption describing its geometry.

Method: LegoGPT

LegoGPT leverages a pre-trained LLM (specifically, LLaMA-3.2-1B-Instruct) and fine-tunes it on the StableText2Lego dataset.

  • LEGO Representation: To enable the LLM to process and generate LEGO designs, structures are converted into a simple text format. Each brick is represented by its dimensions and coordinates, e.g., "2x4 (5,3,1)". The bricks are ordered sequentially, typically in a raster-scan manner (bottom-to-top, then within each layer). This format is concise and includes necessary information for 3D reconstruction and validity checks.
  • Model Fine-tuning: The pre-trained LLaMA model is fine-tuned using instruction tuning, training the model to generate the text representation of a LEGO structure when prompted with a text description like "Create a LEGO model of {caption}." Fine-tuning utilizes techniques like LoRA for efficiency. The model learns to predict the sequence of bricks autoregressively: p(b1,,bNprompt)=i=1Np(bib1,,bi1,prompt)p(b_1, \dots, b_N | \text{prompt}) = \prod_{i=1}^N p(b_i | b_1, \dots, b_{i-1}, \text{prompt}).
  • Integrating Physical Stability: A key challenge is ensuring the generated designs are physically stable and buildable. While training on stable data helps, the generative nature of LLMs can still produce invalid or unstable outputs. The paper proposes an inference strategy that combines brick-by-brick validity checks with a physics-aware rollback mechanism:
    • Brick-by-Brick Rejection Sampling: As the model generates each new brick token, basic validity checks are performed. These include ensuring the brick dimensions and position are well-formatted, within the allowed grid bounds, and do not collide with previously placed bricks. If a generated brick is invalid, it is rejected, and the model resamples the next token until a valid one is produced.
    • Physics-Aware Rollback: Crucially, physical stability is not checked after every brick is added. Checking partial structures can be misleading, as many designs are only stable when complete. Instead, stability analysis is applied to the completed structure (or after a block of bricks is added). If the full structure is found to be unstable (i.e., any brick has a stability score of 0 according to the physics analysis), the algorithm identifies the first unstable brick in the sequence. The generated structure is then "rolled back" to the state just before this first unstable brick was added. Generation resumes from this earlier, stable point. This process is repeated until a fully stable structure is generated or a maximum number of rollbacks is reached. The stability analysis relies on solving a nonlinear program involving forces and torques on each brick to find static equilibrium, as described by equations involving forces FijF_i^j and torques τij\tau_i^j. The stability score sis_i for brick ii is derived from the maximum dragging force required (Dimax\mathcal{D}_i^{\max}) relative to the material's friction capacity (FTF_T).

This inference time validation and rollback strategy helps to significantly increase the proportion of valid and stable designs generated by the model, as demonstrated by ablation studies showing drops in validity and stability without these components.

LEGO Texturing and Coloring

Beyond generating geometry, the paper explores adding appearance to the generated LEGO models:

  • UV Texture Generation: For a generated LEGO structure, a mesh is created by merging the visible bricks. A UV map is generated (e.g., via cube projection). A text-based mesh texturing method (FlashTex) is used to generate a texture image based on a text prompt describing the desired appearance.
  • Uniform Brick Color Assignment: Alternatively, each brick can be assigned a uniform color from a standard LEGO palette. This is achieved by converting the LEGO structure to a voxel grid, creating a UV-unwrapped mesh from the voxels, generating a texture using FlashTex based on an appearance prompt, averaging the color across the visible faces of each voxel belonging to a brick, and finally finding the closest standard LEGO color for that brick's average color.

Implementation Considerations and Applications

Implementing LegoGPT involves several components:

  • Dataset Pipeline: Setting up the mesh-to-LEGO conversion, structural augmentation, stability analysis (requiring a physics solver like Gurobi), and caption generation pipelines is necessary to create the training data. The stability analysis involves solving a nonlinear program (Eq. 3) subject to constraints on forces.
  • LLM Fine-tuning: Fine-tuning a pre-trained LLM (like LLaMA) on the prepared text dataset is the core training step. This requires significant computational resources (e.g., multiple GPUs like NVIDIA RTX A6000s) and expertise in training LLMs.
  • Inference Engine: The generation process requires the fine-tuned LLM, the brick validity checks (format, bounds, collision), and the physics stability analysis component (again, potentially using a solver like Gurobi). The physics-aware rollback mechanism needs to be implemented to iteratively check stability and backtrack when necessary (Algorithm 1). The choice of parameters like rollback limits and temperature scaling during rejection sampling affects generation time and quality.
  • Post-processing (Optional): For applications involving appearance, pipelines for generating textured meshes or assigning uniform colors are needed, potentially integrating external tools like FlashTex.

The generated, buildable, and stable LEGO designs have direct real-world applications:

  • Manual Assembly: The brick-by-brick generation sequence naturally serves as an intuitive guide for human users to assemble the structure by hand.
  • Automated Robotic Assembly: The stable and structured nature of the designs makes them suitable for automated assembly by robots. The paper demonstrates this using a dual-robot-arm system, which can leverage the generated brick sequence and physical properties to plan and execute assembly steps. This requires additional components like action planning, multi-agent coordination, and robot control systems.

Limitations:

Current limitations include being restricted to a 20×20×2020\times 20 \times 20 grid resolution and a fixed set of standard bricks, primarily due to dataset scale and computational constraints. Scaling to larger resolutions, more complex brick types (slopes, tiles), and more diverse objects from larger datasets like Objaverse-XL would be areas for future work to improve generalization and design complexity.

Youtube Logo Streamline Icon: https://streamlinehq.com