GOATex: Occlusion-Aware 3D Mesh Texturing
- GOATex is a framework that partitions 3D meshes into visibility layers using hit-level analysis to generate seamless textures for both visible and occluded surfaces.
- It employs a novel two-stage visibility control with residual face clustering and normal flipping to guide a diffusion-based synthesis process without model fine-tuning.
- The method integrates soft UV blending and multi-view consistency to produce coherent, artifact-free textures, outperforming conventional render–project–inpaint pipelines.
GOATex is an occlusion-aware, diffusion-based framework for 3D mesh texturing that systematically addresses the generation of seamless, high-fidelity textures for both the exterior and previously inaccessible interior regions of complex meshes. Unlike conventional pipelines, which perform poorly on occluded or interior surfaces and result in visually implausible artifacts and texture seams, GOATex implements a novel geometric decomposition via “hit levels,” enabling progressive, structurally consistent texturing with principled soft blending across visibility layers. The approach operates fully zero-shot—requiring no fine-tuning of diffusion models—and supports fine-grained, prompt-driven control of layered appearance, yielding superior results for both exterior and interior regions (Kim et al., 28 Nov 2025).
1. Problem Formulation and Limitations of Prior Approaches
The core problem addressed by GOATex is to synthesize a color texture mapping for a UV-parameterized 3D mesh , such that both exterior (directly visible) and interior (occluded) surfaces exhibit plausible, seam-free appearance under arbitrary viewpoints. Existing project-and-inpaint pipelines—relying on multi-view images and heuristic texel filling (e.g., Voronoi propagation, smooth extrapolation)—struggle with the texturing of occluded regions, often leaving interiors semantically implausible and surfaces with visible seams and artifacts.
GOATex introduces a geometry- and occlusion-aware pipeline. The method fundamentally departs from standard render–project–inpaint methods by introducing explicit mesh decomposition into visibility layers, conditioning texturing with hit-level information, and blending the resulting multi-stage textures in UV space.
2. Hit Levels and Visibility Layer Decomposition
The cornerstone of GOATex is the assignment of hit levels to mesh faces via multi-view ray casting. The mesh’s face set is oversegmented into “superfaces” (groups of connected, low-curvature faces, typically extracted with Xatlas). For a collection of camera viewpoints and dense set of rays , the ordered intersections of rays with faces are recorded. For each face and intersection order , a directional influence,
is computed, where is the unit normal and the ray’s direction. Each superface is assigned a hit level by majority influence,
with all comprising faces inheriting . This yields an ordered partition into up to visibility layers,
representing increasingly deeper, less externally visible mesh regions.
The hit-level assignment method ensures geometric coherence of layers, supporting well-defined, interpretable partitions for progressive texturing.
3. Two-Stage Visibility Control for Progressive Layer Revelation
A naïve per-layer rendering quickly degenerates into sparse or fragmented geometry in deeper layers, impairing diffusion-based texture synthesis. GOATex employs two successive visibility control mechanisms:
- Residual Face Clustering: At each layer , the set of faces to newly texture is , maintaining a contiguous shell for conditioning the diffusion model.
- Normal Flipping with Backface Culling: For layers deeper in the mesh, direct rendering can be adversely thin. GOATex renders the full mesh but flips the normals of faces that are already textured (for ), making them invisible under backface culling and revealing newly available interior regions. The active set at each stage is
where denotes faces with flipped normals. Each pixel in rendered depth maps is marked with a binary mask , indicating whether it corresponds to a newly active region for layer ; this mask guides the region to be conditioned in the diffusion model.
This two-pronged mechanism preserves geometric coherence in each intermediate depthmap, supporting both semantic and spatial structure across progressive layers.
4. Diffusion-Based Texturing with Multi-View Consistency
At each visibility layer , GOATex synthesizes textures using a depth-conditioned variant of Stable Diffusion 1.5 augmented with ControlNet. For each camera view , input consists of rendered depth maps and masks (resolution ). Text prompts are enriched with detailed view and region cues. Each stylized output image is unprojected onto the mesh UV atlas, yielding a partial layer texture .
To enforce cross-view consistency, GOATex utilizes multi-view diffusion (MVD) sampling analogous to SyncMVD, synchronizing features during denoising to ensure the same surface region renders coherently from different directions. Notably, the diffusion backbone is not fine-tuned; all synthesis is performed zero-shot.
Distinct prompts can be supplied for exterior and interior layers, enabling separate artistic or semantic control of layered appearance.
5. Soft UV-Space Blending for Seamless Texture Integration
Layerwise texture maps may overlap in UV space, with boundaries corresponding to the geometry's occlusion transitions. Rather than hard overwriting, GOATex computes a soft UV blend based on view-dependent visibility confidence.
For each texel , the per-layer, per-view confidence weight is
aggregated across views: and masked by texel validity . Final normalized layer weights are assigned via a softmax: The composite texture is
This methodology yields physically plausible, seamless transitions across visibility layers, eliminating hard seams and suppressing layer priority artifacts.
6. Implementation and Performance
GOATex employs Stable Diffusion 1.5 with Depth ControlNet, operating at a view resolution of and outputting UV textures at . Hit-level decomposition typically produces layers using 16 hemispherical camera views. Mesh segmentation and ray casting are performed using PyTorch3D and Open3D. For a 10,000-face mesh, texturing inference requires approximately 120 seconds per layer, with hit-level assignment taking around 300 seconds (one-time cost). The method operates on a single RTX A6000 and is training-free; no model fine-tuning is required.
7. Empirical Evaluations, Limitations, and Directions
On a dataset of 226 Objaverse meshes with complex interiors, prompts generated via GPT-4o described rich interior/exterior details. Qualitative results show that GOATex achieves coherent, semantically detailed textures on interiors (e.g., interior walls, dashboards), while alternative methods (e.g., TEXTure, SyncMVD) leave interiors blank or heavily artifacted, and UV-inpainting baselines (Paint3D, TEXGen) yield overly uniform or repetitive patterns.
In A/B user studies over 400 meshes, human raters preferred GOATex in over 70% of cases compared to all baselines; GPT-4.1 judges agreed in over 60% of cases. Ablation studies show cumulative contributions of key pipeline stages:
- Hit‐Level Assignment only (no superfaces): +75%
- +Superface Construction: +80%
- +Soft UV Merging: +86%
- +Residual Face Clustering: +81%
- +Normal Flipping & Backface Culling: +91%
The texturing runtime scales linearly with mesh face count and the number of hit levels.
GOATex’s limitations are primarily geometric: hit levels are defined purely by ray-based geometry, causing some semantically unified regions (e.g., thin shells, nested cavities) to be split across layers with minor style discontinuity. Dependence on Stable Diffusion 1.5 may limit fine-grained adherence to detailed textual prompts.
Potential extensions include incorporating semantic part-segmentation into hit-level grouping, integrating soft UV-blending directly into the diffusion denoising loop for joint optimization, and extending to support dynamic/deformable meshes and non-diffuse material channels (e.g., roughness, metalness).
8. Summary
GOATex constitutes the first geometry- and occlusion-aware texturing pipeline for 3D meshes that (a) systematically partitions geometry into visibility layers via hit-level analysis, (b) enables progressive, structurally-coherent revelation and texturing of both exterior and occluded interior surfaces, and (c) merges textures with a soft UV blend driven by visibility confidence. The approach robustly outperforms previous methods on both qualitative and quantitative metrics for seamless, high-fidelity 3D texturing across complex mesh geometries (Kim et al., 28 Nov 2025).