Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
112 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

StableText2Lego Dataset

Updated 24 July 2025
  • StableText2Lego is a large-scale dataset of over 47,000 physically stable LEGO assemblies arranged on a discretized 20×20×20 grid with precise brick placements.
  • The generation methodology uses voxelization and a split-and-remerge algorithm with common brick types, ensuring geometric diversity and rigorous physical stability through sequential autoregressive modeling.
  • Each model is paired with five GPT-4o generated captions focusing on geometric structure, supporting applications in robotic construction, text-conditioned design, and manual LEGO assembly.

StableText2Lego is a large-scale dataset of physically stable LEGO assembly structures, each paired with descriptive textual captions, designed to support research in text-conditioned generative design, robotics, and the physical validation of modular structures. The dataset was introduced to facilitate the generation of buildable, interlocking brick structures from natural language descriptions, ensuring that all generated assemblies are both geometrically diverse and physically stable (Pun et al., 8 May 2025).

1. Dataset Structure and Content

The StableText2Lego dataset comprises over 47,000 distinct LEGO structures, each represented as an assembly of individual bricks positioned on a standard LEGO baseplate. Each brick is encoded by its dimensions and explicit placement in a (x, y, z) grid within a 20×20×20 discretized space. Bricks are specified in a custom text-based format—"{h}×{w} ({x},{y},{z})"—and are ordered raster-scan style from the lowest to the highest layer.

The dataset was constructed by processing more than 28,000 unique 3D models from the ShapeNetCore corpus. For each base object, several structurally distinct LEGO variants were generated to improve diversity, stability, and representation across 21 canonical object categories distinguished by their geometric attributes. The dataset further includes, for every 3D structure, five descriptive captions that deliberately focus on structure and geometry (not color), each generated by rendering the design from 24 viewpoints and prompting GPT-4o for text descriptions.

2. Generation Methodology

The creation pipeline for StableText2Lego involves several sequential stages. Firstly, each ShapeNetCore mesh is voxelized into a 20×20×20 grid, preserving the broad volumetric characteristics of the original object. A legolization procedure—specifically, a split-and-remerge algorithm—populates the occupied voxels with a selection of eight frequently used LEGO brick types (e.g., 1×1, 1×2, 1×4, 1×6, 1×8, 2×2, 2×4, 2×6), introducing layout randomness while preserving global shape.

Subsequent to initial placement, the arrangement is further diversified: multiple distinct brick layouts per object are created via randomized "split-and-remerge" passes and brick order permutations, increasing the probability of both physical soundness and architectural variety. Critical to inclusion in the dataset, every final structure must pass a series of physical stability evaluations (see §4).

For generative modeling, the dataset supports an autoregressive design paradigm. In mathematical terms, the probability of generating a brick assembly B=[b1,b2,...,bN]B = [b_1, b_2, ..., b_N] is factorized as

p(b1,b2,...,bN)=i=1Np(bib1,...,bi1)p(b_1, b_2, ..., b_N) = \prod_{i=1}^N p(b_i | b_1, ..., b_{i-1})

enabling sequential generation via next-brick prediction conditioned on previous steps. The models are trained and fine-tuned using this formulation, leveraging a variant of LLaMA-3.2-1B-Instruct adapted to the structured brick format.

3. Captioning and Textual Representation

Each LEGO structure in the dataset is annotated with five captions generated for research in text-to-structure modeling. Caption grounding is achieved by rendering the structure from 24 perspectives and prompting GPT-4o for structural and geometric descriptions. These captions are decoupled from color and stylistic embellishments, focusing strictly on form, symmetry, layer count, prominent features, and spatial orientation (e.g., "A squat rectangular brick object with two side wings and a hollow center"). The textual brick encoding supports straightforward integration with LLM pipelines and aligns with common autoregressive generation workflows.

4. Physical Stability Validation

A distinguishing characteristic of StableText2Lego is the explicit guarantee that every included design is physically stable. The evaluation process enforces that no brick has a zero stability score; designs with structural defects or isolated unstable bricks are excluded. Stability is determined by a detailed physical force analysis, encompassing:

  • Gravity,
  • Vertical support and pressing forces at inter-brick contacts,
  • Shear forces (dragging and pulling),
  • Stability scores, computed by solving a nonlinear equilibrium problem for candidate contact forces and torques.

More precisely, for each brick, the equilibrium conditions enforced are:

  • Translational: jFij=0\sum_j F_i^j = 0,
  • Rotational: j(τij=Lij×Fij)=0\sum_j (\tau_i^j = L_i^j \times F_i^j) = 0, with LijL_i^j the lever arm for force FijF_i^j. Dragging force magnitudes are compared against an experimentally measured friction threshold FTF_T; a brick with stability score zero (i.e., DimaxFTD_i^\text{max} \geq F_T) is unstable and triggers "physics-aware rollback." During dataset generation and autoregressive model inference, any design failing these equilibrium constraints is pruned, and the structure is regenerated from the last stable state.

Validation additionally includes "brick-by-brick" rejection sampling—checking, for every proposed brick, that it is within the grid bounds, properly formatted, and does not physically collide with existing parts.

5. Applications and Use Cases

StableText2Lego was principally designed to support:

  • Training of text-conditioned generative models (e.g., BrickGPT) capable of producing stable, buildable LEGO assemblies from natural language prompts.
  • Robot assembly: the dataset encodes all assembly stages brick-by-brick, enabling direct transfer of construction sequences to robotic agents. The paper demonstrates successful dual-arm robotic construction experiments, relying on the dataset's inherent physical stability and ordered assembly.
  • Manual construction: step-by-step build sequences allow human assembly using standard LEGO sets.
  • Computer-aided design: the physics-based filters and multi-caption descriptions provide resources for evaluation, structure optimization, and the development of automated design critique tools.

Additional resources included in the release support text-based LEGO texturing, enabling application of UV textures and color gradients for further creative exploration.

6. Dataset Release and Access

The dataset, codebase, and generative models are publicly available at https://avalovelace1.github.io/LegoGPT/, accompanied by detailed documentation, download instructions, and demonstration videos. Researchers and practitioners are provided with all required scripts for generation, captioning, and physical validation, facilitating further experimentation and downstream application development using StableText2Lego (Pun et al., 8 May 2025).

7. Context and Comparative Datasets

StableText2Lego emerges alongside other physically-grounded LEGO assembly datasets (notably, StableLego (Liu et al., 16 Feb 2024)), which focus on stability analysis of block-stacking assemblies and include unstable as well as stable layouts for experimental benchmarking. Unlike datasets that include both physically feasible and infeasible configurations, StableText2Lego is curated exclusively for buildable, stable designs, emphasizing real-world applicability in both human and robotic construction settings. A plausible implication is that the dataset's strict stability requirements make it especially suited for embodied AI and robotic automation research, as well as textual generative design tasks requiring guaranteed physical realizability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)