Text Encoded Extrusion (TEE) Techniques

Updated 6 February 2026

Text Encoded Extrusion (TEE) is a methodological framework that encodes geometric and semantic information as discrete text tokens to drive synthesis tasks, including 3D mesh construction and multi-modal classification.
It employs a series of extrusion commands processed by large language models, ensuring topologically consistent, invertible 3D meshes with flexible face counts and controlled reconstruction error.
TEE also fuses textual descriptors with visual data in multi-modal settings, notably improving classification accuracy by overlaying coherent, semantically-rich image patches.

Text Encoded Extrusion (TEE) refers to a suite of methodologies that leverage text-encoded operations or features to drive downstream synthesis or classification tasks. The core concept centers on encoding structural, geometric, or semantic information as sequences of text tokens, which are subsequently fused with visual or geometric generators for tasks such as 3D mesh construction or multi-modal classification. Instances include mesh construction by discrete, symbolic extrusion commands (Christiansen et al., 30 Jan 2026) and the fusion of encoded textual information with images for improved classification (Gallo et al., 2018). Distinct TEE variants appear in both geometric modeling and multi-modal vision, unified by the principle of using a sequence of discrete text representations—either as direct commands or as visual modulations—to deterministically or stochastically guide the synthesis process.

1. Mesh Construction via Text Encoded Extrusion

The TEE paradigm for mesh construction, introduced by Christiansen et al. (Christiansen et al., 30 Jan 2026), encodes the assembly of 3D meshes as sequences of extrusion operations, expressed as text tokens. Here, mesh synthesis is interpreted as a sequence generation problem, utilizing LLMs to produce topologically consistent 3D manifolds by iteratively executing extrusion commands.

Token Grammar

A TEE mesh sequence comprises tokens of the following forms:

E⟨i⟩: Apply extrusion operation i (from a set of precomputed, clustered representatives).
P⟨j⟩: Push current mesh state to a database under index j.
gp, sv ⟨v₁ … vₖ⟩: Retrieve a previously stored state and select vertices (as a closed loop) for the base patch of the next extrusion.
Re: Mark a previous extrusion as requiring a subsequent state push.

The abstract grammar is:

$\langle\text{TEE}\rangle ::= \langle\text{Command}\rangle^*$

where

$\langle\text{Command}\rangle ::= \text{`E'}\,\langle\text{ExtrusionID}\rangle \mid \text{`P'}\,\langle\text{StateID}\rangle \mid \text{`gp'}\,\langle\text{StateID}\rangle \mid \text{`sv'}\,\langle\text{VertexID}^+\rangle \mid \text{`Re'}$

2. Parameterization and Execution of Extrusions

A single extrusion, $E_i$ , is characterized by the tuple $E_i = (B_i, U_i, \Delta_i)$ , where:

$B_i$ encodes the triangle connectivity in the base patch.
$U_i = \{(u_i^k, v_i^k)\}_{k=1}^{n_i}$ provides harmonic map coordinates (boundary mapped to unit circle).
$\Delta_i = \{\delta_i^k \in \mathbb{R}^3\}_{k=1}^{n_i}$ gives displacement vectors from the 2D parametric domain back to 3D.

Boundary displacements decompose as $\delta_i^k = s_i^k R(\theta_i^k) d_i^k$ , where $s_i^k$ is a step size, $d_i^k$ a unit direction, and $R(\theta_i^k)$ a local planar rotation about the boundary normal.

The application of an extrusion involves:

Parameterizing the selected face loop in 2D via a harmonic map;
Interpolating new vertex positions from 2D barycentric coordinates using $B_i$ and $\Delta_i$ ;
Mapping new vertices to 3D via local frame transport;
Inserting a new face loop and updating mesh connectivity.

3. TEE-Guided Mesh Decomposition and Reconstruction

Mesh decomposition operates on genus-0 quadrilateral meshes (FEQ meshes) free from self-intersections. The process constructs a directed acyclic graph (DAG) capturing extrusion dependencies, reverses the topological sort, and records a compressed sequence of extrusion operations:

Decomposition Pseudocode

def DecomposeToExtrusions(M):
    L = set of all non-self-intersecting face loops in M
    E_list = []
    while L not empty:
        find all leaf loops ℓ in L
        choose ℓ* with minimal base-patch area
        collapse ℓ* to record extrusion E
        append E to E_list
        remove ℓ* and update loops
    return reverse(E_list)

Reconstruction processes the TEE token stream to reassemble the mesh, managing state snapshots for handling mesh edits or branching structures.

4. LLM Architecture and Training Regime in TEE

The mesh TEE system employs Llama 3.2 1B, trained only on the causal decoding head with no architectural modifications. Input and target sequences are TEE command streams for each mesh feature. The loss is standard cross-entropy:

$\mathcal{L} = -\sum_{t=1}^{T} \log p_\theta(\mathrm{token}_t \mid \mathrm{token}_{1:t-1})$

Dataset sources include "Hexalab" FEQ meshes, quad-remeshed DFAUST upper-body, MANO-hand models, and skeleton-based extractions. All decomposed extrusions are clustered via K-means on $\Delta_i$ fields to obtain $K=20,000$ unique representatives.

Training sequences are conditioned by a special base-patch token, guiding the model toward contextually relevant extrusions for each mesh part.

5. Guarantees: Manifoldness and Flexible Face Count

Each TEE extrusion is designed as an invertible, topology-preserving operation on a disk-like base patch. Consequently, any composition of extrusions yields a genus-0 manifold with quadrilateral faces and no self-intersections. No two extrusions will overlap in a manner violating 2-manifoldness. The process supports any arbitrary face count: each extrusion can append as many quads as its loop length, enabling arbitrarily large or complex meshes. In contrast, directly autoregressive polygon-list synthesis suffers from attention bottlenecks and lacks topological guarantees.

6. Quantitative and Qualitative Evaluation in Mesh Synthesis

Performance metrics reported:

With $K=1,000$ extrusion clusters: average per-vertex deviation ≈ 1.2 cm on a 1 m object.
At $K=20,000$ : reconstruction error falls below 0.5 mm, indistinguishable visually.

On MANO hands, TEE achieves FID ≈ 13.2 (versus MeshXL's FID ≈ 66.4), indicating superior generative quality. Sampling the LLM at temperatures $T=0.5 \rightarrow 1.5$ and top- $k=5 \rightarrow 100$ controls the fidelity-novelty trade-off, producing diverse and compositional shapes (e.g., hybrid creatures reusing anatomical features).

TEE also enables mesh editing: by starting from a user-defined base patch, extrusions can auto-complete new geometric features, supporting in-place modifications and feature addition.

A related, distinct application of TEE arises in multi-modal fusion for classification (Gallo et al., 2018). Here, textual descriptions are encoded by a CNN into a three-channel $W_t \times H_t$ patch, then "extruded" onto the RGB image as a super-pixel block at a fixed spatial location, producing a fused image as input to a standard CNN classifier (AlexNet or GoogleNet). The approach allows direct CNN-based learning of joint visual–textual features, yielding consistent performance gains over text-only, image-only, early-fusion, and late-fusion baselines.

Key results include:

Ferramenta (e-commerce): TEE-fused models attain up to 95.45% top-1 accuracy, surpassing image-only and text-only models.
UPMC Food-101: TEE-fused models reach up to 83.37% top-1 accuracy.

The fusion mechanism provides visually similar patches for same-class items, indicating coherent semantic encoding by the text CNN.

8. Limitations and Research Directions

For mesh synthesis, TEE is currently limited to genus-0 FEQ meshes and relies on robust clustering of lossless extrusions for high fidelity. A plausible implication is that extending TEE to high-genus or non-quadrilateral meshes would require new decomposition strategies or command grammars.

For image–text fusion, fixed spatial patch placement can occlude salient image content; adaptive overlay mechanisms or joint-stream architectures could alleviate this. Handling long text, multiple captions, or modality translation remain open problems.

Both TEE paradigms exemplify structured, invertible, symbolic control over complex synthesis tasks, with ongoing research poised to extend these capabilities to broader domains and richer output classes (Christiansen et al., 30 Jan 2026, Gallo et al., 2018).

Markdown Report Issue Upgrade to Chat

References (2)

Learning to Build Shapes by Extrusion (2026)

Image and Encoded Text Fusion for Multi-Modal Classification (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Text Encoded Extrusion (TEE).