SAGE Framework in XR

Updated 14 November 2025

SAGE is a semantic-driven framework that uses adaptive Gaussian splatting to dynamically adjust scene details for immersive XR experiences.
It employs neural segmentation to assign context-aware LOD, balancing visual fidelity with reduced memory and computational overhead.
The framework supports interactive XR tasks like real-time manipulation and autonomous robotics, demonstrating significant performance gains in resource-constrained environments.

SAGE: Semantic-Driven Adaptive Gaussian Splatting in Extended Reality is a framework designed for three-dimensional scene visualization with a focus on interactive eXtended Reality (XR) applications. Its primary innovation is the dynamic adaptation of the Level of Detail (LOD) for different scene components, selectively chosen via semantic segmentation. This approach leverages 3D Gaussian Splatting (3DGS) to maintain user-perceptible visual quality while efficiently reducing memory footprint and computational overhead, a trade-off critical for high-performance XR environments (Schiavo et al., 20 Mar 2025).

1. Background: 3D Gaussian Splatting in XR

3D Gaussian Splatting is a recent volumetric rendering technique that represents a scene as a set of anisotropic Gaussians distributed in 3D space. Each Gaussian encodes spatial location, orientation, size, color, and opacity, allowing compact and continuous scene rendering that is highly parallelizable and avoids mesh tessellation artifacts. In XR, this paradigm supports lifelike experiences across robotics, virtual/augmented reality, and interactive simulation. However, high-fidelity rendering of complex scenes using a uniform Gaussian density can quickly exceed device memory and processing limits, indicating the necessity for principled LOD management.

2. Semantic Segmentation–Driven Adaptive LOD

SAGE introduces an explicit semantic segmentation pipeline as a precursor to adaptive LOD assignment. The scene is first segmented into object regions according to their semantic class (e.g., furniture, walls, manipulable objects). Each segment is then mapped to a context-aware LOD, dynamically adjusting the number and parameters of Gaussians representing that entity. High-salience objects (e.g., user-touched elements or objects in focus) can be rendered at maximal fidelity, while background or less relevant items are downsampled or represented more coarsely.

This semantic-driven approach enables SAGE to:

Assign computational budget where it most impacts visual experience.
Avoid wasting resources on out-of-focus or peripheral elements.
React adaptively to user attention and task context in real-time, an essential requirement for immersive XR.

3. Integration of Segmentation, LOD Control, and Rendering

The SAGE framework is composed of three tightly coupled modules:

Semantic Segmentation Module: Applies a neural network (typically an encoder-decoder architecture trained on RGB-D or multi-view imagery) to generate high-confidence semantic masks across the scene.
LOD Controller: Maps each semantic label or object class to a target LOD using pre-defined heuristics or learned mappings—this may be a function $f(s_i)$ , where $s_i$ is a semantic score for region $i$ , outputting the appropriate Gaussian density or scale for splatting.
Adaptive Renderer: Synthesizes the final scene from the hierarchically organized Gaussians, applying downsampling, merging, or pruning operations as dictated by the LOD controller. Renderer execution is orchestrated to maintain the chosen LODs on a per-object basis, maximizing throughput and visual realism within memory/GPU constraints.

The system maintains a rendered image quality at or above a prescribed threshold (e.g., as measured by PSNR or SSIM) despite aggressive resource optimization.

4. Practical Optimization and Resource Scaling

SAGE delivers substantial efficiency gains by linking the density and scale of Gaussian primitives directly to semantic relevance rather than spatial uniformity. The main resource-saving mechanisms include:

Reducing the total Gaussian count for objects outside user attention or at greater scene depth.
Adapting Gaussian size (i.e., covariance) to enable coarser representation without introducing visual discontinuities.
Optionally leveraging dynamic streaming for large-scale scenes, loading high-LOD details only on demand.

Experimental evaluations show decreased memory and computational requirements while maintaining target visual quality, demonstrating the suitability of SAGE for interactive XR devices constrained by GPU memory or compute budgets. This adaptive policy is particularly impactful for mobile and embedded XR platforms.

5. Applications and Experimental Results

SAGE’s architecture is optimized for highly interactive XR scenarios:

Real-time manipulation tasks, where user attention shifts rapidly among labeled scene elements.
Remote collaboration, prioritizing fidelity for user-shared or discussed objects while compressing the remainder.
Autonomous robotics, where semantic classes indicating navigational hazards or manipulable items automatically trigger higher LOD.

Empirical results confirm that SAGE reduces overhead without sacrificing user-perceived visual fidelity. Quantitative metrics such as frame-rate, memory consumption, and image quality (measured by PSNR, SSIM) demonstrate SAGE’s effectiveness as an optimization layer for practical XR pipelines.

6. Impact, Limitations, and Future Directions

The semantic-driven LOD concept represents a significant shift from geometry-centric scene simplification to a perceptual and task-oriented framework. However, SAGE’s effectiveness is contingent on the accuracy and speed of the semantic segmentation stage, and on robust mappings between semantic class and LOD assignment. The framework’s design also presumes that semantic importance can reliably proxy for perceptual importance; scenes where visual salience is independent of semantic labels may require additional attention reasoning.

Future work can explore:

Joint optimization of segmentation and LOD mapping, possibly via reinforcement learning or user-feedback-in-the-loop.
Adapting the underlying splatting parameters in response to predicted user gaze or task objectives.
Extension to time-varying scenes, incorporating semantic tracking across frames for temporally coherent LOD control.

SAGE enables a flexible, semantically grounded approach to resource allocation in real-time 3D scene rendering, representing a step toward scalable, user-centric extended reality platforms (Schiavo et al., 20 Mar 2025).

PDF Markdown Chat (Pro)

References (1)

SAGE: Semantic-Driven Adaptive Gaussian Splatting in Extended Reality (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to SAGE Framework.