Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries (2512.13796v1)

Published 15 Dec 2025 in cs.CV

Abstract: Though Gaussian splatting has achieved impressive results in novel view synthesis, it requires millions of primitives to model highly textured scenes, even when the geometry of the scene is simple. We propose a representation that goes beyond point-based rendering and decouples geometry and appearance in order to achieve a compact representation. We use surfels for geometry and a combination of a global neural field and per-primitive colours for appearance. The neural field textures a fixed number of primitives for each pixel, ensuring that the added compute is low. Our representation matches the perceptual quality of 3D Gaussian splatting while using $9.7\times$ fewer primitives and $5.5\times$ less memory on outdoor scenes and using $31\times$ fewer primitives and $3.7\times$ less memory on indoor scenes. Our representation also renders twice as fast as existing textured primitives while improving upon their visual quality.

Summary

The paper introduces nexels, a hybrid explicit/implicit representation decoupling geometry and appearance to enable efficient, real-time novel view synthesis.
It leverages a generalized Gaussian kernel with variable sharpness and a compact neural field for view-dependent, high-frequency texture details.
Quantitative results demonstrate drastic reductions in primitives and memory while achieving perceptual parity and twice the rendering speed compared to prior methods.

Neurally-Textured Surfels for Efficient Real-Time Novel View Synthesis

Motivation and Context

The paper "Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries" (2512.13796) addresses a central limitation in point-based 3D scene representations, particularly those using 3D Gaussian splats (3DGS): the inefficiency in representing scenes with simple geometry but highly textured surfaces. Point-based rendering tightly couples geometric and appearance parameters, necessitating a vast number of primitives for fine appearance detail even on topologically simple surfaces. In contrast, traditional mesh-based approaches decouple geometry from appearance using textures but have not achieved competitive performance in novel view synthesis due to optimization and differentiability issues. This work introduces nexels, a neurally-textured surfel primitive, that decouples geometry and appearance via a hybrid explicit/implicit representation, achieving significant reductions in memory and compute without sacrificing rendering quality or speed.

Representation: Geometry and Appearance Decoupling

Nexels advance the state-of-the-art by separating scene geometry (modeled as sparse surfels) from appearance (encoded using a shared neural field and per-primitive color features). The geometric component leverages a generalized Gaussian kernel, parameterized for variable sharpness through a differentiable gamma parameter, allowing interpolation between smooth Gaussian shapes and sharp quad indicators. This flexibility enables effective modeling of opaque, flat surfaces and surface boundaries.

Figure 1: Kernel shape evolution with the gamma parameter, showing the transition from a Gaussian to a sharp quad indicator.

Appearance is captured by per-primitive colors for coarse rendering, supplemented by a compact neural field based on the Instant-NGP architecture for view-dependent, high-frequency texture details. The neural field is queried only at a sparse, fixed number of the most visually significant primitives per pixel, drastically reducing computational burden compared to per-ray or per-primitive neural texture evaluations in prior works.

Rendering Architecture and Pipeline

Rendering in nexels comprises two passes: a collection pass and a texturing pass. The collection pass performs alpha compositing of all relevant surfels and identifies, using blending weights, the top-K primitives per pixel for texturing. The texturing pass computes intersection positions for these selected primitives and queries the neural field to obtain view-dependent textures. The final pixel color integrates both the non-textured and neural-textured contributions in a manner optimized for speed and differentiability.

This design maintains real-time performance, in large part due to minimizing neural field evaluations and leveraging explicit geometry for fast rasterization. The adaptive density control mechanism further supports efficient scene representation by dynamically splitting and pruning surfels based on blended photometric errors.

Quantitative Evaluation and Results

The experimental protocol is comprehensive, benchmarking nexels on a superset of standard datasets (Mip-NeRF360 indoor/outdoor, Tanks and Temples, and a custom high-frequency texture dataset). Evaluation metrics include PSNR, SSIM, and LPIPS. The perceptual quality (LPIPS) is emphasized due to its stronger alignment with human sensitivity to texture fidelity.

Figure 2: LPIPS vs. number of primitives and memory across datasets; nexels consistently outperform other methods under computation and memory budgets.

Nexels achieve perceptual parity with 3DGS at 9.7× fewer primitives and 5.5× less memory on outdoor scenes, and 31× fewer primitives and 3.7× less memory on indoor scenes. Additionally, nexels render twice as fast as prior textured primitive methods while improving visual quality. For example, on the Mip-NeRF360 outdoor dataset, 400K nexels produce LPIPS comparable to 3.9M Gaussians in 3DGS at an order-of-magnitude reduction in resources.

Figure 3: Comparison on sparse geometries; nexels maintain high visual quality with as few as 40K primitives, outperforming textured splatting baselines.

Figure 4: Low-memory scenario; nexels exhibit superior texture reconstruction and background fidelity at a fraction of the primitive and memory cost.

Practical and Theoretical Implications

The key implication of nexels is that high-quality novel view synthesis does not require dense geometry as previously assumed. Through efficient geometry-appearance decoupling and the selective application of neural textures, nexels deliver scalable scene representations optimized for both memory and compute, paving the way for real-time rendering in resource-constrained settings.

Practically, nexels enable novel view synthesis for large-scale environments and photorealistic applications where memory and render speed are critical, such as AR/VR, telepresence, and robotics. The modularity of the system, with its explicit geometry and shared neural texture, facilitates integration with ray tracing and level-of-detail schemes.

From a theoretical standpoint, nexels demonstrate the utility of hybrid explicit/implicit field compositions and serve as a template for further research on sparse, differentiable rendering systems. The generalized Gaussian kernel presents new opportunities for surfel and primitive modeling, potentially extending to adaptive hierarchical schemes.

Limitations and Future Directions

Despite their strengths, nexels are currently reliant on GPU tensor core acceleration for real-time neural texture computation, limiting low-end and mobile device viability. The neural field can introduce noise in unseen regions, in contrast to the blurring artifacts typical of point-based methods. Moreover, motion blur and depth-of-field effects remain unresolved, affecting reconstruction of fine details in photometric datasets.

Figure 5: Artifacts in unseen regions and under motion blur — neural field noise and reconstructive failures affecting text.

Figure 6: Geometric limitations; missing thin structures and incorrect background optimization at low primitive counts.

Future work may address these limitations through level-of-detail hierarchies, improved neural architectures for texture synthesis, and support for complex imaging conditions. The top-K selection scheme for textures may be extended to more sophisticated blending models and dynamic primitive selection.

Conclusion

Nexels present a resource-efficient, real-time solution for novel view synthesis, leveraging neurally-textured surfels to decouple geometry and appearance. Experimental results establish strong performance at drastic reductions in primitive count and memory, with implications for scalable, photorealistic scene modeling. Nexels suggest a paradigm shift towards sparse, hybrid representations within differentiable rendering, with promising avenues for extension in large-scale and domain-adaptive scenarios.

PDF Markdown

Whiteboard

Generate a whiteboard explanation of this paper.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper introduces “nexels,” a new way to build and render 3D scenes from photos so you can see them from new viewpoints in real time. The big idea is to separate a scene’s shape (what blocks light) from its look (the colors and tiny details), so the system can show sharp, detailed images using far fewer tiny pieces and less memory than popular methods, while staying fast.

What were the researchers trying to find out?

They asked simple questions:

Can we make new views of a 3D scene look just as good while using far fewer tiny pieces (“primitives”) and less memory?
Can we keep the rendering fast enough for real-time use (like games or VR)?
Can we capture fine details (like text or patterns) without needing millions of pieces?
Is there a better way to split shape and appearance so each can be handled efficiently?

How did they do it?

Think of building a 3D scene like making a mosaic:

Old approach (3D Gaussian splatting): paint the scene with millions of soft, blurry dots. Each dot carries both shape and color. To show fine details (like the tiny notes on sheet music), you need a lot of dots—even if the surface itself is flat—so it gets heavy and slow.
Nexels’ approach: split the job into two: 1) Geometry (shape): use tiny, flat tiles called “surfels” placed in 3D space. Nexels make these tiles act more like crisp squares when needed, so edges and flat surfaces look sharp. 2) Appearance (texture): instead of storing a separate image for every little tile, they use one shared “neural texture”—a small neural network that, given a 3D position, returns the right color. It’s like a smart paintbrush that knows what color to use anywhere in the scene.

Here’s how rendering works, in everyday terms:

Step 1: Quick rough pass. For each pixel on the screen, the system blends the most important tiles it sees along the camera ray (like stacking semi-transparent stickers). While doing this, it also picks only a few top tiles per pixel (for example, K=2) that really matter for what you’ll see.
Step 2: Add detail only where it helps. For those few important tiles, it asks the shared neural texture for exact colors at their positions. Then it blends these detailed colors into the rough image. This way, even if many tiles overlap, the system only does the expensive “ask the neural net” step for a couple of them per pixel—keeping everything fast.

A few more simple pieces:

“Primitive” = one tiny piece of the 3D representation (a tile in this system).
“Neural field/texture” = a small learned function that returns color for any 3D point.
“Surfels” = small, flat surface elements placed in 3D; here they can behave more like crisp quads to make sharp edges.
“Real time” = roughly 30+ frames per second (FPS).

Training the system:

They start from a rough 3D point cloud (from a tool like COLMAP) and optimize:
- Where to place tiles and how big they are (shape).
- The neural texture’s parameters (appearance).
They also automatically split tiles where more detail is needed and remove tiles that don’t help (so the model stays compact).

What did they discover?

The main results show big efficiency gains without losing visual quality:

Similar quality with far fewer pieces:
- Outdoors: about 9.7× fewer primitives and 5.5× less memory than standard 3D Gaussian splatting, with similar perceptual quality.
- Indoors: about 31× fewer primitives and 3.7× less memory, also with similar quality.
Faster than other “textured” methods:
- Nexels render more than twice as fast as a popular neural-texture method (NeST-Splatting), and faster than per-tile image-texture methods (like BBSplat), while often looking better.
- Average rendering speed in their tests: around 50 FPS, versus about 23 FPS for NeST-Splatting and about 20 FPS for BBSplat.
Works especially well on highly detailed textures:
- They built a new dataset with lots of fine patterns and text. Nexels captured these details well without needing a huge number of pieces.
Stays strong under tight budgets:
- Even when limited to very few pieces or small memory (like ~40–50 MB), nexels kept better visual quality than methods that don’t separate shape and appearance.

Why these results matter:

“LPIPS” (a measure of how similar images look to humans; lower is better) was consistently as good or better with far fewer primitives.
The system stays real-time even with detailed textures because it only asks the neural texture for a handful of important tile hits per pixel.

Why does this matter?

Better visuals on less hardware: Because nexels need far fewer pieces and less memory, devices like laptops, phones, or VR headsets can show high-quality views without heavy hardware.
Real-time apps: Games, VR/AR, virtual tours, and robotics need fast, high-quality rendering. Nexels provide both speed and detail.
Scales to complex scenes: Separating shape (where tiles go) from appearance (the smart shared texture) makes it easier to handle big scenes with lots of fine patterns without ballooning memory.
Cleaner modeling of surfaces: The tiles can act like crisp squares for sharp edges and flat surfaces, improving the look of real-world objects like tables, signs, or walls.

In short, the paper shows a practical way to render realistic, detailed new views of real scenes in real time using much less data. By letting a small neural texture paint only what matters and letting simple tiles represent shape, nexels hit a sweet spot of quality, speed, and efficiency.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise list of what remains missing, uncertain, or unexplored in the paper that future work could address:

Fixed top-K texturing per pixel: no analysis of how K should be chosen, adapted per scene/pixel, or scheduled during training; no error bounds on the approximation introduced by limiting textured interactions to the top-K contributors.
Differentiability of the top-K selection: the hard, per-pixel top-K choice is non-differentiable and may bias gradients; no exploration of soft top-k, Gumbel-top-k, or alternative continuous relaxations, nor analysis of training stability and convergence under this selection.
Near-opaque compositing and depth ordering: the method retains alpha compositing with depth sorting (as in 2DGS) while approaching quad-like, near-binary opacities; order-dependent artifacts and failure modes on interpenetrating geometry or high overdraw are not analyzed; no comparison against exact z-buffered rasterization.
Anti-aliasing of the neural field queries: the proposed depth-based down-weighting is isotropic and heuristic; no treatment of anisotropic pixel footprints (e.g., oblique viewing angles, varying pixel footprint ellipses) or multi-sample integration for the hash grid; no quantitative anti-aliasing ablation.
View-dependent effects in the neural texture: the neural field is conditioned only on world position x and outputs SH coefficients; it does not take viewing direction or surface normal as input, limiting the expressivity for specularities, anisotropy, or complex BRDFs; no evaluation on strongly view-dependent materials.
Transparency, translucency, and participating media: the top-K opaque-like selection and surfel model are not designed for multi-layer transparent media, subsurface scattering, or volumetric effects; behavior on foliage, thin layered structures, tinted glass, or semi-transparent surfaces is not evaluated.
Surfel planarity and complex geometry: representing geometry as local surfels may struggle with fine, highly curved, or intricate structures; the method’s robustness to thin geometries (wires, fences) and high-curvature regions is not systematically assessed.
Kernel parameterization (gamma) behavior: no ablation on the impact of the generalized Gaussian gamma on reconstruction quality, stability, and training dynamics; potential gradient pathologies near sharp transitions are not discussed.
Capacity scaling of the neural field: the trade-offs among hash-grid capacity (T, L, F), collisions, memory, and quality are not studied; lack of guidance on selecting grid sizes for large, complex scenes, and no analysis of out-of-distribution failure due to hash collisions.
World-space anchoring of textures: using only x for texturing can cause “texture swimming” during geometry updates and may not align well with local surface parameterizations; potential benefits of incorporating local (u, v), normals, or learned UVs are not explored.
Temporal stability: no evaluation of flicker or temporal consistency in rendered video (especially as the top-K set changes across frames); no stabilization strategies (e.g., temporal smoothing of K-selection or texture features).
Numerical robustness of the texture loss: the loss divides by the sum of blending weights; stability when the sum is near zero or when pixels have minimal top-K contributions is unspecified; clamping or safeguards are not described.
Dependence on COLMAP initialization: the approach relies on COLMAP for sparse points and accurate poses; robustness to pose errors, poor reconstructions in low-texture regions, and sparse capture conditions is not tested.
Densification/pruning strategy: the heuristics and hyperparameters (e.g., split rate, opacity threshold, target primitive cap P) are fixed; sensitivity analyses, convergence guarantees, and automated scheduling/termination criteria are not provided.
Generalization, few-shot, and sparse-view regimes: all experiments are dense-pose, per-scene optimization; performance with few views, wide-baseline setups, or under pose uncertainty remains unexplored.
Relighting and material decomposition: the representation bakes in scene illumination; no factorization into intrinsic properties (albedo, normals, BRDF); relighting and editable materials are unsupported.
Large-scale and high-resolution rendering: scalability to city-scale scenes, multi-room indoor environments, 4K–8K rendering, and out-of-core or multi-GPU training/inference is not addressed; no analysis of memory-bandwidth bottlenecks.
Runtime portability: performance is reported on an RTX 6000 Ada; portability to consumer GPUs, mobile/embedded hardware, and non-CUDA backends (e.g., Metal, DirectX) is not evaluated.
Comprehensive ablations: missing ablations on (i) K, (ii) gamma, (iii) anti-aliasing filter, (iv) loss weights, and (v) MLP/grid capacities; lack of clarity on which components most contribute to quality and speed.
Fairness and breadth of baselines: no quantitative comparison with strong mesh-based texturing pipelines after baking (e.g., modern differentiable meshing, VMesh, BakedSDF variants) at similar memory/quality targets; no user studies or more perceptual metrics beyond LPIPS.
Robustness to illumination changes and dynamics: only static scenes with fixed lighting are considered; handling of dynamic objects, non-rigid motion, or varying lighting conditions is not studied.
Quality guarantees for top-K approximation: no theoretical bounds or empirical analysis of reconstruction error as a function of K, overdraw, scene opacity distribution, or noise; no adaptive K scheduling to meet target error budgets.
Failure cases and diagnostics: the paper does not document qualitative failure modes (e.g., disocclusion tearing, edge halos, ghosting at high-contrast textures) or provide diagnostic tools/metrics to detect and mitigate them.
Dataset scope: the custom dataset focuses on high-frequency textures but may be biased (materials, lighting, motion); no public details on capture diversity, licensing, or standardized splits to facilitate broad benchmarking.

View Paper Prompt View All Prompts

Glossary

2D Gaussian splatting (2DGS): A point-based rendering technique where each primitive is a 2D Gaussian surface element (surfel) embedded in 3D, enabling differentiable rasterization. "Each surfel in 2D Gaussian splatting (2DGS) is a 2D Gaussian in 3D space"
3D Gaussian splats (3DGS): A representation that models a scene with many 3D Gaussian primitives, each encoding geometry and appearance, rendered in real time via rasterization-like splatting. "Point-based representations like 3D Gaussian splats (3DGS) merge these roles"
Adaptive density control: A training procedure that dynamically prunes and densifies primitives to meet quality and budget targets. "Adaptive Density Control."
Alpha compositing: The process of blending semi-transparent samples along a ray using their opacities and accumulated transmittance. "which are then alpha-composited and passed through a feed-forward network."
Alpha textures: Per-primitive textures storing opacity values, often increasing the number of blended fragments and computational cost. "whose alpha textures further increase overdraw"
Beta splats: Non-Gaussian splatting primitives using beta kernels that can achieve high quality with fewer parameters. "Beta splats have been particularly effective at achieving higher rendering quality with fewer parameters."
COLMAP: A structure-from-motion and multi-view stereo pipeline used to initialize 3D points and camera poses. "we sample the point cloud output of COLMAP for initialization"
Depth buffering: A rasterization mechanism that uses a depth buffer to resolve visibility, fetching textures only for visible fragments. "in standard mesh rasterization with depth buffering"
Densification: The process of splitting or adding primitives during training to better cover underfit regions of the scene. "We perform a densification and pruning step every $100$ iterations"
Differentiable rendering: Rendering techniques that are differentiable with respect to scene parameters, enabling gradient-based optimization from images. "the dominant strategy is differentiable rendering"
D-SSIM: A differentiable form of the Structural Similarity Index used as a perceptual loss during training. "D-SSIM loss"
Empty space skipping: Acceleration that avoids sampling and computation in empty regions during volumetric or field-based rendering. "Later works use empty space skipping and other acceleration techniques"
Farthest point sampling: A heuristic to select well-spread points by iteratively picking the farthest new point from those already selected. "we use farthest point sampling to reduce the initial point cloud"
Fragment buffer: A per-pixel data structure that stores multiple candidate fragments (e.g., IDs, weights, depths) for compositing. "Inspired by fragment buffer techniques"
Generalized Gaussian kernel: A parametric kernel that interpolates between Gaussian-like and rectangle-like shapes to model near-opaque, sharp-edged surfels. "We use a generalized Gaussian kernel in order to model near-opaque primitives."
Gaussian splatting: Rendering with explicit Gaussian primitives projected and blended to form images, offering real-time performance without volumetric integration. "Though Gaussian splatting has achieved impressive results in novel view synthesis,"
Instant-NGP: A neural field architecture combining a multiresolution hash-grid with a tiny MLP for fast, high-capacity function approximation. "Instant-NGP is a neural field architecture composed of a multiresolution hash-grid"
LPIPS: A learned perceptual image similarity metric used to evaluate visual fidelity beyond pixel-wise errors. "We evaluate the LPIPS $\downarrow$ across multiple settings"
Mesh rasterization: The process of converting surface meshes to screen-space fragments for rendering, typically with a depth buffer and texture mapping. "standard mesh rasterization with depth buffering"
Mip-NeRF360: A multi-view dataset for 360-degree scenes used to benchmark novel view synthesis methods. "the Mip-NeRF360 dataset"
Multi-layer perceptron (MLP): A feed-forward neural network with one or more hidden layers used to map features to outputs (e.g., radiance). "a tiny multi-layer perceptron (MLP) network"
Multiresolution hash-grid: A set of hash-tabled feature grids at multiple scales that provides spatial features for neural field queries. "a multiresolution hash-grid $\mathcal{H}$ "
Neural field: A continuous function represented by a neural network that maps coordinates (e.g., 3D positions) to quantities like density or color. "A neural field implicitly represents a quantity over a region, such as 3D space, through neural network queries."
Neural radiance fields (NeRFs): Neural fields that map 3D positions and view directions to volumetric density and radiance for photorealistic novel view synthesis. "These neural radiance fields (NeRFs) have slow rendering speed"
Overdraw: Excessive fragment shading/compositing due to many overlapping contributions, which can reduce rendering speed. "whose alpha textures further increase overdraw"
Photometric loss: An image-space objective comparing rendered and ground-truth pixels, often combining L1 and perceptual terms. "we compute a photometric loss between the prediction and ground truth"
PSNR: Peak Signal-to-Noise Ratio, a pixel-wise fidelity metric commonly used to assess reconstruction quality. "We evaluate photometric quality with the standard PSNR, SSIM, and LPIPS metrics"
Silhouette gradients: Gradients of the loss with respect to the projected outlines of surfaces, important for optimizing discrete meshes. "complications in computing silhouette gradients"
Spherical harmonics: A set of basis functions on the sphere used to compactly model view-dependent radiance. "parameterized by channel-wise spherical harmonics coefficients"
Surfel: A small, oriented surface element (with position, normal, and extent) used as a primitive in point-based rendering. "We represent the geometry of nexels using a set of $N$ surfels"
Tile-based rasterizer: A GPU rendering scheme that processes screen-space tiles to improve locality and performance of compositing. "The primitives are alpha-composited in a tile-based rasterizer."
Transmittance: The accumulated fraction of light not yet absorbed or occluded along a ray up to a given depth. "we compute the alpha $\alpha_i$ and transmittance values $T_i$ "
View-dependent radiance: Appearance that changes with viewing direction, often modeled via learned functions or spherical harmonics. "outputs a view-dependent radiance"
Volumetric rendering: Rendering by integrating emitted and absorbed radiance along rays through a continuous volume defined by density and color fields. "for volumetric rendering"

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below are practical, deployable applications that can leverage Nexels today, based on the paper’s demonstrated performance (real-time rendering at 40–60 FPS on commodity GPUs), memory reduction (3–5× lower than prior textured primitives; 10–30× fewer primitives than standard splatting), and training workflows (COLMAP initialization, 30k iterations, adaptive density control).

Real-time scene viewers for scanning-based 3D experiences
- Sectors: software, XR (AR/VR), real estate, cultural heritage, e-commerce
- What: Convert multi-view photos of real spaces or objects into interactive, photorealistic 3D viewers with sharp edges and high-frequency textures (e.g., text, patterns) that run at real-time speeds on desktop GPUs and can be adapted for web (WebGPU) viewers.
- Tools/products/workflows:
- Capture with a phone or DSLR; calibrate and reconstruct with COLMAP
- “Nexelify” pipeline (initialization via farthest point sampling → 30k-iteration training → export runtime package)
- Viewer plugins for Unreal/Unity or a standalone WebGPU viewer using the two-pass rasterization and K-limited neural texture fetch
- Assumptions/dependencies: Good camera calibration and coverage; opaque surfaces; GPU availability (desktop-class recommended); current training takes ~1–5 hours per scene
Asset optimization for game and VFX pipelines
- Sectors: gaming, film/VFX, software tools
- What: Replace heavy point-based or mesh+texture assets with Nexels to retain fine detail in textures while drastically reducing primitive count and memory footprint, improving load times and runtime performance in interactive scenes.
- Tools/products/workflows:
- DCC tool plugin (Blender/Maya) to ingest point clouds or multi-view photos and export nexel assets
- Runtime renderer module that composites non-textured radiance and per-pixel K textured intersections
- LOD management via P (primitive budget) and K (per-pixel texture samples)
- Assumptions/dependencies: Static or quasi-static scenes; integration with game engines’ rendering backends; pipeline support for spherical harmonics and the Instant-NGP field
Photorealistic product visualization from sparse captures
- Sectors: e-commerce, marketing, manufacturing
- What: Turn a small set of product photos into an interactive viewer that preserves detailed textures (labels, stitching, fine materials), enabling zoomed inspection without heavy textures or millions of splats.
- Tools/products/workflows:
- Cloud service to upload images and generate a nexel viewer
- Embeddable widget for product pages using K=2 neural texture sampling for consistent performance
- Assumptions/dependencies: Accurate camera poses; consistent lighting across captures; GPU inference on the server or client
Digital twin walkthroughs with efficient streaming
- Sectors: architecture, facilities management, real estate, digital twins
- What: Publish interactive tours of buildings or worksites with reduced memory and high texture fidelity, enabling faster streaming and lower storage costs.
- Tools/products/workflows:
- Progressive scene delivery (hash-grid first; incremental primitive batches up to budget P)
- On-device viewer with fixed compute budget via K-limited texturing
- Assumptions/dependencies: Stationary scenes; adequate coverage of edges and texture-rich surfaces; hash-grid size tuned for device constraints
Academic benchmarking and course modules in differentiable rendering
- Sectors: academia (graphics, vision)
- What: Use the provided dataset and pipeline to teach decoupled geometry/appearance, surfel-based differentiable rasterization, and neural field texturing. Benchmark against 3DGS/2DGS and textured primitives with reproducible settings.
- Tools/products/workflows:
- Lab assignments around kernel gamma tuning (quad-like indicators), anti-aliasing via level down-weighting, and adaptive density control
- Shared codebase for Instant-NGP-backed textures with K-select buffers
- Assumptions/dependencies: Access to GPUs; course materials integrate PyTorch/CUDA and COLMAP
Low-memory 3D content for web
- Sectors: web software, media publishing
- What: Publish 3D scenes that remain visually detailed under strict memory budgets (e.g., 40–60 MB), outperforming point-based baselines on LPIPS at comparable sizes.
- Tools/products/workflows:
- Export pipeline targeting WebGPU with compact hash-grid configurations (e.g., T=2¹⁹⁾ and K=2
- Assumptions/dependencies: Performance depends on GPU class and browser WebGPU support; opaque scenes preferred
Robotics simulation environments with crisp occlusions
- Sectors: robotics, autonomy, simulation
- What: Sim environments that need sharp boundaries and high-frequency textures for realistic sensor simulation (e.g., camera-based navigation, OCR of signage) using sparse primitives.
- Tools/products/workflows:
- Import Nexels into simulators for camera realistic rendering; adjust gamma to enforce quad-like opacity for near-opaque surfaces
- Assumptions/dependencies: Static environment assumptions; integration with sim engine; lighting approximated via spherical harmonics
Compliance-friendly remote site documentation
- Sectors: construction, insurance, auditing
- What: Efficiently capture and publish photorealistic reconstructions for audits and claims, reducing storage/compute costs while maintaining detail important for documentation.
- Tools/products/workflows:
- Repeatable capture → COLMAP → Nexels export with fixed rendering budget policies (K, P)
- Assumptions/dependencies: Privacy/consent for scans; rigorous capture protocols; data handling aligned with organizational policies

Long-Term Applications

These applications are plausible extensions that require further research, scaling, or engineering (e.g., mobile acceleration, dynamic content, standardized tooling).

Mobile XR and web-scale deployment
- Sectors: XR, mobile software, browsers
- What: Real-time Nexels on smartphones/AR headsets and in browsers with WebGPU, enabling on-device capture-to-view pipelines.
- Tools/products/workflows:
- Hardware-friendly kernels; shader code generation for two-pass K-select rendering; hash-grid compression/quantization; streamable neural fields
- Assumptions/dependencies: Mobile GPU acceleration; efficient hash-grid access; memory and battery constraints; broader WebGPU adoption
Dynamic and relit scenes (time-varying geometry and appearance)
- Sectors: gaming, VFX, telepresence
- What: Extend Nexels to handle moving objects, changing lighting, and relightable materials by augmenting neural fields with temporal and BRDF parameters.
- Tools/products/workflows:
- Multi-field conditioning (time, lighting), material parameter inference, per-primitive dynamic updates
- Assumptions/dependencies: New training objectives and data; higher compute; robust handling of motion blur and specularities
Standardization and interchange formats for neurally textured primitives
- Sectors: software standards, content platforms, policy
- What: Define an open format for Nexels (geometry kernel, SH radiance, Instant-NGP textures, K-select buffers) to enable cross-tool interoperability and archival of digital heritage assets.
- Tools/products/workflows:
- Schema extensions to glTF/USD for neural fields and differentiable rasterization metadata
- Assumptions/dependencies: Community consensus; reference implementations; licensing/IP for scanned spaces and neural field weights
Cloud services for capture-to-3D at scale
- Sectors: SaaS, media platforms, e-commerce, real estate
- What: Automated pipelines that ingest user photos, produce Nexels, and host viewers with progressive streaming and device-aware K/P scheduling.
- Tools/products/workflows:
- Multi-tenant training orchestration; quality controls; content moderation and privacy safeguards
- Assumptions/dependencies: Cost-effective GPU fleets; robust SLAs; user-friendly capture guidance
Generative content: edit and synthesize textures with neural fields
- Sectors: creative tools, design, marketing
- What: Use the shared neural field to enable prompt-based texture edits (e.g., replace labels, recolor patterns) while preserving geometry and occlusion cues.
- Tools/products/workflows:
- Inverse rendering workflows; texture GANs conditioned on Instant-NGP features; guardrails for IP compliance
- Assumptions/dependencies: Reliable disentanglement of geometry vs. appearance; content authenticity policies
Robotics perception and planning with unified render-and-sense representations
- Sectors: robotics, autonomous systems
- What: Joint simulation and training using Nexels for photorealistic visuals alongside occupancy/semantic layers; faster domain transfer due to sharper edges and realistic textures.
- Tools/products/workflows:
- Hybrid maps (nexels + voxel semantics); differentiable sensor models; curriculum learning with progressive texture fidelity
- Assumptions/dependencies: Integration with perception stacks; bridging radiometric realism and physical accuracy
Energy-efficient 3D media pipelines and sustainability policy
- Sectors: policy, cloud providers, media platforms
- What: Promote lower-memory, lower-overdraw 3D representations to reduce compute and storage footprints across platforms, aligning with sustainability targets.
- Tools/products/workflows:
- Platform guidelines (K and P budgets); eco-labels for 3D assets; reporting frameworks on energy savings
- Assumptions/dependencies: Verified lifecycle analyses; stakeholder buy-in; standards for measuring rendering energy
High-fidelity telepresence with multi-camera capture
- Sectors: communications, XR, enterprise collaboration
- What: Real-time novel view synthesis for live telepresence using Nexels from synchronized multi-camera rigs; sparse geometry with detailed textures for natural remote presence.
- Tools/products/workflows:
- Low-latency training/inference; streaming hash-grid updates; view-dependent SH radiance blending across cameras
- Assumptions/dependencies: Robust multi-view calibration; low-latency GPUs; dynamic handling of moving participants

Cross-cutting assumptions and dependencies

Scene properties: Best for mostly opaque, static surfaces; handling transparency, complex specularities, or dynamic scenes needs extensions.
Data requirements: Multi-view, calibrated images (e.g., via COLMAP) with sufficient coverage; consistent lighting improves quality.
Compute: Current real-time claims rely on desktop-class GPUs; mobile/web deployment needs further optimization and possibly hardware support.
Quality/performance controls: K (per-pixel textured intersections) and P (primitive budget) must be tuned to device constraints and desired fidelity.
Integration: Engine/runtime support for spherical harmonics, two-pass rasterization, hash-grid neural fields, and differentiable surfel rendering.
IP and privacy: Scanning real spaces/assets requires permissions; neural textures may encode sensitive details—policy and compliance workflows should be in place.

Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries (2512.13796v1)

Summary

Neurally-Textured Surfels for Efficient Real-Time Novel View Synthesis

Motivation and Context

Representation: Geometry and Appearance Decoupling

Rendering Architecture and Pipeline

Quantitative Evaluation and Results

Practical and Theoretical Implications

Limitations and Future Directions

Conclusion

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What were the researchers trying to find out?

How did they do it?

What did they discover?

Why does this matter?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Glossary

Practical Applications

Immediate Applications

Long-Term Applications

Cross-cutting assumptions and dependencies

Open Problems

Continue Learning

Authors (8)

Collections

Tweets

Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries (2512.13796v1)

Sponsor

Summary

Neurally-Textured Surfels for Efficient Real-Time Novel View Synthesis

Motivation and Context

Representation: Geometry and Appearance Decoupling

Rendering Architecture and Pipeline

Quantitative Evaluation and Results

Practical and Theoretical Implications

Limitations and Future Directions

Conclusion

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What were the researchers trying to find out?

How did they do it?

What did they discover?

Why does this matter?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Glossary

Practical Applications

Immediate Applications

Long-Term Applications

Cross-cutting assumptions and dependencies

Open Problems

Continue Learning

Related Papers

Authors (8)

Collections

Tweets