NaTex: 3D Texturing & Numeric Extraction
- NaTex refers to two distinct frameworks: one for 3D-native texture generation using a geometry-aware latent diffusion pipeline, and another for automated numeric attribute extraction in e-commerce.
- The texture generation framework reconceptualizes color as a continuous 3D field, achieving lossless alignment and superior multi-view consistency through a geometry-controlled diffusion process.
- The numeric extraction system employs automated alias expansion and a multi-task NER/CRF model, delivering significant F1-score gains across multilingual e-commerce datasets.
NaTex refers to two unrelated state-of-the-art frameworks published under the names “NaTex” (Lai et al., 20 Nov 2025) (Seamless Texture Generation as Latent Color Diffusion) and “LaTeX-Numeric” (Mehta et al., 2021) (Language-agnostic Text Attribute Extraction for E-commerce Numeric Attributes). Despite the shared shorthand, their domains, methodological innovations, and foundational objectives are distinct: the former introduces a 3D-native pipeline for generative mesh texturing, while the latter addresses scalable, language-agnostic extraction of numeric attributes from messy product text using distant supervision. Both systems advance their respective areas through novel architectural couplings and end-to-end automation, setting new precision standards on large-scale tasks.
1. Definition and Overview
NaTex in (Lai et al., 20 Nov 2025) designates a native 3D texture generation framework that predicts surface color directly in 3D space, employing latent color diffusion using a geometry-aware variational autoencoder (VAE) and a multi-control diffusion transformer (DiT). By conceptualizing texture as a continuous dense color point cloud rather than as a collection of 2D baked maps, NaTex achieves state-of-the-art performance in texture coherence, boundary alignment, and multi-view consistency.
NaTex (LaTeX-Numeric) in (Mehta et al., 2021) describes a fully automated, multi-task, language-agnostic extraction pipeline for numeric (and non-numeric) product attributes from unstructured text. Its core contributions include an automated unit alias-generation algorithm and a mask-aware multi-task NER/CRF training scheme that robustly addresses missing annotation and heterogeneity in real-world e-commerce datasets.
2. NaTex for Native 3D Texture Generation
Traditional mesh texturing pipelines synthesize multi-view 2D images through rendering or Multi-View Diffusion (MVD), followed by UV or mesh “baking.” These approaches encounter persistent challenges: occluded region inpainting, drift at geometric seams, and multi-view content/color inconsistency. NaTex (Lai et al., 20 Nov 2025) reconceptualizes the entire mesh texture as a dense 3D RGB field, with each sample of the surface yielding and the generative objective reduced to learning a function encoding the RGB color at surface or near-surface point .
This framing enables lossless alignment between mesh surface detail and color, direct handling of occluded regions (no view-based holes), and unified consistency in texture appearance across all observed viewpoints.
3. Geometry-Aware Latent Color Diffusion Architecture
The core pipeline comprises:
- Geometry-Aware Color Point-Cloud VAE: A dual-branch architecture encodes into color latents and geometry latents , both with shared token structure . Cross-attention mechanisms enable geometry guidance on color encoding, tying geometric cues to color prediction. The decoder reconstructs the continuous color field by attending to sampled at arbitrary 3D coordinates.
- Multi-Control Diffusion Transformer (DiT): Texture generation operates in a latent space, where the DiT learns to denoise (reverse diffuse) color latents conditioned both on geometry latents, positional embeddings (RoPE), and optional image or color controls. Conditioning tokens include:
- Patch tokens from Dinov2-Giant image embeddings,
- RoPE-embedded per-token positions,
- Concatenated geometry latents ,
- Optionally, initial color latents for downstream refinement.
Loss objectives combine a VAE regularization term, color regression loss over surface/near-surface points, and a UDF-based geometry consistency penalty. The DiT is trained with flow-matching or rectified-flow loss plus an illumination-invariance constraint:
where and is a noise sample for invariant supervision.
- Native Geometry Control: RoPE assigns explicit spatial context to each latent; geometry and color latents are paired per query point, enabling sub-face boundary alignment. Geometry control is enforced at every generation step, producing sharp color transitions precisely at mesh seams without bleeding or drift.
4. NaTex for Automated Numeric Attribute Extraction
NaTex (“LaTeX-Numeric”) (Mehta et al., 2021) targets the extraction of structured attribute values (e.g., RAM, screen size, weight) from unstructured product text at e-commerce scale, absent any hand-labeled data. Its pipeline involves:
- Distant Supervision with Automated Alias Expansion: Each product provides text and attribute values (with canonical units ). For every attribute with a non-missing , every substring matching “ [optional space] [unit/alias]” in is labeled. Since products often lack complete attribute annotation, many true mentions remain unlabeled (the Missing-PA issue).
- Alias Generation Algorithm: The system automatically infers an expanded list for each attribute using data-and-value suffix mining (alias_dw), generic numeric-pattern suffixes (alias_bp), and semantic embeddings (cosine similarity in GloVe or fastText) to retain only semantically relevant units. Units whose cosine similarity to the canonical exceeds a threshold are retained. Certain attributes marked as “exclusive-alias” assign BIO tags to any “number + alias” match in text, regardless of structured label presence.
- Multi-Attribute, Multi-Task NER (MAMT-NER): Character‐CNN, word‐embedding, and BiLSTM layers are shared between attributes; each attribute receives an independent CRF tagger head. The loss for attribute in sample is masked out if is missing. The total learning objective averages active losses only over non-missing attributes:
COMPARISON TABLE: Extraction Pipeline Components (from (Mehta et al., 2021))
| Step | Purpose | Automation |
|---|---|---|
| Distant Supervision | Generate noisy pseudo-labels | Full |
| Alias Generation (auto-alias) | Expand unit/alias list per attribute | Full |
| MAMT-NER Architecture | Mask-aware, multi-task training | Full |
| CRF Tagger (per attribute) | Sequence labeling per attribute | Full |
5. Quantitative Results and Evaluation
NaTex (Texture Generation) (Lai et al., 20 Nov 2025)
NaTex demonstrates superior performance to prior work in both texture reconstruction and image-conditioned synthesis. For example, with 24,576 latent tokens (dimension 64), the VAE achieves PSNR 30.86 and SSIM* 0.987 across six orthographic views. In image-conditioned texture generation, NaTex attains cFID 21.96, CMMD 2.055, CLIP 0.908, and LPIPS 0.102, improving over Paint3D, TexGen, Hunyuan3D-2, RomanTex, and MaterialMVP on all but one metric.
Ablation studies show that removing RoPE or uncoupling the geometry branch sharply degrades boundary precision. The system also supports high-fidelity one-step sampling at test time due to precise, native geometry conditioning.
| Latent Size | PSNR↑ | PSNR*↑ | SSIM*↑ | LPIPS*↓ |
|---|---|---|---|---|
| 6144×64 | 28.74 | 31.70 | 0.980 | 0.0492 |
| 12288×64 | 29.95 | 33.19 | 0.984 | 0.0445 |
| 24576×64 | 30.86 | 34.30 | 0.987 | 0.0411 |
| Method | cFID↓ | CMMD↓ | CLIP↑ | LPIPS↓ |
|---|---|---|---|---|
| Paint3D | 26.86 | 2.400 | 0.887 | 0.126 |
| TexGen | 28.23 | 2.447 | 0.882 | 0.133 |
| Hunyuan3D-2 | 26.43 | 2.318 | 0.889 | 0.126 |
| RomanTex | 24.78 | 2.191 | 0.891 | 0.121 |
| MaterialMVP | 24.78 | 2.191 | 0.921 | 0.121 |
| NaTex | 21.96 | 2.055 | 0.908 | 0.102 |
NaTex (LaTeX-Numeric) (Mehta et al., 2021)
NaTex achieves a 20.2% F1-score gain in numeric attribute extraction from auto-aliasing (relative to canonical-only baseline). Incorporating the MAMT-NER multi-task framework yields an additional 9.2% (CNN-BiLSTM) or 3.5% (BERT) improvement. Non-numeric (textual) attributes also benefit, with an average 7.4% increase. Gains are language-agnostic: On three Romance languages, the pipeline outperforms the baseline by 13.9%.
| Configuration | Numeric F1 (%) | Non-numeric F1 (%) | Romance lang F1 (%) |
|---|---|---|---|
| Baseline (canonical/MAST) | 100.0 | 100.0 | 100.0 |
| Auto-aliasing (MAST) | 120.2 | – | 106.0 |
| Auto-aliasing + MAMT-NER | 131.2 | 107.4 | 113.9 |
6. Downstream Applications and Extensions
NaTex (Texture Generation) extends natively to:
- Material Generation: Supplement color channels with metallic/roughness, producing full PBR texture maps in a single pass.
- Texture Refinement & Inpainting: Consumes flawed/partial textures as control latents and rapidly denoises or inpaints with geometric alignment.
- Part Segmentation/Texturing: Accepts 2D or 3D segmentation masks as image control tokens and generates part-aligned 3D color or label fields, operating in both zero-shot and few-shot settings.
- Generalization: Architecture and codebase remain unaltered for downstream tasks; varying only conditioning signals (image, geometry, partial color) enables rapid adaptation.
NaTex (LaTeX-Numeric) is validated on 20 numeric attributes across 5 E-commerce product categories, 3 English marketplaces, and 3 Romance languages. The plug-and-play approach allows immediate scaling to non-English settings (via fastText embeddings) and rapid onboarding of new attributes without manual list curation.
7. Significance and Limitations
Both NaTex frameworks set new technical standards for their application domains. The 3D-native approach to texture generation resolves long-standing mesh UV alignment and occlusion problems, while multi-task, fully automatic attribute extraction enables robust e-commerce data mining. Notably, NaTex (LaTeX-Numeric) does not perform unit conversions for extracted numeric values at inference (e.g., "5 pounds" to "2.27 kg"); this is deferred for future work. Both systems are fully automated—no hand annotation or alias list curation is required.
The methodologies are representative of emerging trends in vision and language: geometry-aware latent conditioning, 3D spatial priors, multi-control diffusion, and language-agnostic automated supervision. These advances facilitate reliable, explainable deployment of generative and extraction models at web scale, informing a broad range of practical applications in 3D content creation and structured product analysis.
References:
- "NaTex: Seamless Texture Generation as Latent Color Diffusion" (Lai et al., 20 Nov 2025)
- "LaTeX-Numeric: Language-agnostic Text attribute eXtraction for E-commerce Numeric Attributes" (Mehta et al., 2021)
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free