RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting
Abstract: 3D Gaussian Splatting (3DGS) enables real-time novel view synthesis with high visual quality. However, existing methods struggle with semi-transparent specular surfaces that exhibit both complex reflections and clear transmission, often producing blurry reflections or overly occluded transmission. To address this, we present RT-Splatting, a framework that disentangles each Gaussian's geometric occupancy from its optical opacity. This factorization yields a unified surface-volume scene representation with a single set of Gaussian primitives. Our hybrid renderer interprets this representation both as a surface to capture high-frequency reflections and as a volume to preserve clear transmission. To mitigate the ambiguity in jointly optimizing reflection and transmission, we introduce Specular-Aware Gradient Gating, which suppresses misleading gradients from highly specular regions into the transmission branch, effectively reducing distracting floaters. Experiments on challenging semi-transparent scenes show that RT-Splatting achieves state-of-the-art performance, delivering high-fidelity reflections and clear transmission with real-time rendering. Moreover, our factorization naturally enables flexible scene editing. The project page is available at https://sjj118.github.io/RT-Splatting.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper introduces RT-Splatting, a new way to make computer-generated 3D scenes look realistic when you have thin, seeโthrough but shiny things in them, like windows or clear plastic. The method lets you see sharp reflections on the surface and also see clearly through the surface at the same time, and it runs fast enough for real-time use.
What questions does the paper ask?
- How can we render scenes with glass-like surfaces so that reflections look sharp but the background seen through the glass stays clear?
- How can we avoid common visual mistakes, like blurry reflections or fake โfloatersโ (made-up blobs) that appear behind the glass?
- Can we do all of this quickly (in real time) and with one unified scene representation, instead of stitching multiple models together?
How does it work? (Simple explanation)
First, some background: modern โGaussian Splattingโ represents a 3D scene using lots of soft, fuzzy 3D dots (Gaussians). When viewed from a camera, these dots are blended to make an image. This is fast and usually looks great, but it struggles with semi-transparent, shiny surfaces where you need both reflection (whatโs bouncing off the surface) and transmission (what you see through it).
RT-Splatting has three core ideas:
- Split โbeing thereโ from โblocking lightโ
- Think of each fuzzy dot as doing two jobs:
- Geometric occupancy: does the surface exist here so it can reflect light? (Like a windowโs surface being in the way for reflections)
- Optical opacity: how much does it actually block or absorb light passing through? (Most clear glass blocks very little)
- By learning these two properties separately, the system can treat glass as a real surface for reflections, while still letting background light pass through.
- A hybrid two-step render, like โsketch then colorโ
- Step 1 (Surface/Reflection pass): The system first figures out where the camera would โtouchโ the surface and gathers surface details (like its direction/normal and shininess/roughness). This is similar to sketching the outline and important notes into special image layers called Gโbuffers. It then computes the shiny reflection using a learned shading function (so mirror-like highlights look crisp).
- Step 2 (Volume/Transmission pass): In parallel, it adds up the light coming from the background behind the glass, making sure the glass doesnโt wrongly block it (thanks to the split between occupancy and opacity).
- Finally, it mixes the reflection and the seeโthrough background. A learned โattenuationโ factor dims the seeโthrough part more when reflections are strong, matching how we perceive glass in real life.
- Smarter learning in tricky shiny regions
- When training the model, shiny spots are hard to get perfect. The remaining mistakes can mislead the โseeโthroughโ part, causing it to invent floaters behind the glass.
- RT-Splatting adds Specular-Aware Gradient Gating, which is like a teacher saying: โIn very shiny, complicated areas, donโt let the seeโthrough part overreact to errors.โ It measures how complex the reflection is in a small patch; if itโs very complex, it turns down the learning signal for the transmission branch there. This cuts down fake floaters and keeps the background crisp.
Thereโs also a light-touch helper: a transparency mask from a pre-trained segmenter. It gently guides the learning so the system doesnโt create โghostโ surfaces that donโt affect the image but would confuse the model.
What did they find?
The authors tested RT-Splatting on real scenes with car windows, plastic films, and other thin transparent surfaces. Compared to other fast methods:
- Reflections are sharper and more realistic (no smearing).
- The background seen through glass is clearer (fewer fake floaters and less unwanted blocking).
- It works in tough situations where the only way to see part of the scene (like a carโs interior) is through glass.
- It still runs in real time and trains in a reasonable amount of time.
- Each part of their design matters: removing the occupancyโopacity split, the reflection/transmission mixing, the gating, or the material scattering pieces makes results worse in measurable ways.
A bonus: because the method separates reflection and transmission, you can edit scenes easilyโmake glass more or less shiny, change its tint, reduce reflections, or adjust roughnessโwithout breaking the rest of the image.
Why does it matter?
This work makes it much easier to render everyday scenes with windows, screens, or clear plastics in a realistic and fast way. Thatโs important for:
- AR/VR and games: believable glass and shiny surfaces at real-time speeds.
- Film and virtual production: reliable, editable reflections and see-through details without heavy manual tricks.
- Robotics and autonomous systems: clearer views through windows or screens while still understanding reflective cues.
Limitations and future steps: RT-Splatting focuses on thin, nearly flat transparent surfaces where light mostly goes straight through (like typical window glass). It doesnโt yet handle strong bending of light (refraction) in thicker materials like solid glass sculptures or water, or multiple internal bounces. Future work could extend it to those harder cases.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a focused list of concrete gaps and open problems that remain unresolved and could guide follow-up research.
Modeling and physical fidelity
- No refraction or multi-bounce light transport: extend the model to thick refractive media (e.g., glass blocks, water) with ray bending, internal reflections, and multi-bounce effects; quantify performance vs. refractive index and surface curvature.
- Single first-hit surface assumption: support multiple stacked or coated layers (e.g., double-pane windows, clear coat + paint) via multi-layer G-buffers and multi-hit deferred shading.
- Heuristic blend instead of Fresnel: the learned attenuation ฮฒ replaces physically based Fresnel blending to accommodate tone mapping; investigate training in linear HDR/RAW space, explicit camera response modeling, or hybrid physics-learned blending to restore physical correctness and energy conservation.
- Subsurface transport simplification: the mixture Csub = TยทCtrans + (1โT)ยทCscatter ignores path length, thickness, and angle dependence; incorporate thickness estimates, BeerโLambert absorption, angle-dependent transmittance, and wavelength-dependent tinting/dispersion.
- Participating media not modeled: extend the forward pass to heterogeneous volumetric media (fog/smoke) and mixed surfaceโvolume scenes with semi-transparent interfaces.
- Material coherence: Cscatter and T are per-Gaussian and may overfit; add material-space priors, spatial coherence, or shared material embeddings with constraints (e.g., energy conservation, roughness/IOR consistency).
Optimization and learning dynamics
- Identifiability of occupancyโopacity factorization: product oยทฮฑ is underdetermined; develop mask-free regularizers or learned priors (e.g., sparsity, smoothness across surfels, depth-aware constraints) to avoid โghostโ geometries without external masks.
- Dependence on external masks (SAM2): quantify robustness to mask noise, prompts, and failure cases; explore end-to-end, self-supervised transparent-region discovery to remove reliance on external segmentation.
- Gradient gating design: gating uses local variance of predicted Cspec over 3ร3 patches; evaluate alternative complexity signals (image gradients, frequency-domain measures, roughness/normal variance, predictive uncertainty) and schedules; analyze early-training misgating when Cspec is inaccurate.
- Scope of gating: gradients are gated only for Ctrans; study gating for occupancy/opacity, normals, and material attributes to prevent leakage into other branches; provide convergence analyses and diagnostics for when gating harms background learning.
- Normal sensitivity: reflections rely on accurate surfel normals; characterize sensitivity and integrate normal priors/supervision or robust normal refinement to mitigate artifacts from noisy surfel orientations.
Scope, generality, and failure modes
- Thin-surface assumption: determine failure boundaries for curved or moderately refractive thin surfaces (e.g., car windshields); create controlled benchmarks varying curvature/IOR to profile degradation.
- Near-field reflections: the method uses a learned shading network rather than explicit reflection tracing; investigate lightweight ray-traced or environment-Gaussian hybrids that capture near-field reflection paths while remaining real-time and compatible with transparency.
- Mirrors and opaque reflectors: study the degenerate case with zero transmission (pure mirrors) and mixed mirrorโglass regions; ensure stable behavior and evaluate against mirror-specific baselines.
- Dynamic scenes: extend and evaluate on moving reflectors and moving backgrounds behind glass; enforce temporal consistency and examine gating behavior under motion and rolling shutter.
- Photometric variability: robustness to auto-exposure, white balance, tone mapping, and saturated highlights is unexamined; explore radiometric calibration, exposure-invariant losses, and HDR pipelines.
Evaluation and benchmarking
- Limited and domain-specific datasets: curate benchmarks with controlled reflection/transmission ground truth (synthetic and real), including thick refractive objects, layered panes, near-field reflectors, and participating media.
- Metrics for decomposition quality: beyond PSNR/SSIM/LPIPS, define reflection- and transmission-specific metrics (e.g., reflection fidelity vs. a relit reference, transmission sharpness/contrast behind glass, leakage/bleed-through measures).
- Baseline coverage: include transparent/refractive baselines (e.g., TransparentGS, refractive NeRF variants) in regimes where they apply to contextualize gains and limitations.
Efficiency and system design
- Scalability and resource use: characterize memory/compute as scene complexity and number/extent of semi-transparent surfaces grow; investigate hierarchical/streamed splats, adaptive multi-res G-buffers, and tile-based deferred shading for higher resolutions.
- Pruning policy: pruning by occupancy risks removing visually important low-occupancy elements; design saliency-aware or uncertainty-aware pruning to preserve critical transparent structures.
- Representation portability: assess how occupancyโopacity factorization extends to 3DGS or alternative primitives and whether deferred/forward hybrid rendering remains stable and efficient.
Editing and applications
- Physically grounded parameter editing: current edits (roughness, transparency, tint, โremove specularโ) lack guarantees of realism; estimate interpretable BRDF/BSDF parameters (roughness, IOR, absorption coefficients) to enable consistent, physically based edits.
- Cross-view consistency of layers: quantify and enforce consistency of reflection/transmission decompositions across viewpoints; add cross-view layer-consistency losses or cycle constraints to reduce layer leakage.
Practical Applications
Immediate Applications
Below are deployable use cases that leverage RT-Splattingโs unified surfaceโvolume Gaussian representation, hybrid deferredโforward rendering, and specular-aware gradient gating to handle semi-transparent, reflective surfaces in real time.
- Glass-aware 3D capture and viewing for built environments
- Sectors: architecture, real estate, cultural heritage (museums, galleries), digital twins
- What it enables: High-fidelity scans where windows, partitions, and display cases retain sharp reflections while remaining see-through; reliable reconstruction of content visible only through glass (e.g., interiors behind windows)
- Workflow: Capture ~200โ300 calibrated views; train RT-Splatting (~0.9h on a 4090); deploy real-time viewer (WebGL/desktop) with reflection/transmission toggles and material editing (tint/roughness/transparency)
- Dependencies/assumptions: Thin, semi-transparent surfaces (negligible refraction); static scene; multi-view coverage; GPU for training/inference; SAM2-based mask regularization (optional but recommended)
- VFX and virtual production: capture-through-glass and compositing
- Sectors: film/TV, advertising, post-production
- What it enables: On-set scans with reflective glass (cars, storefronts, office interiors) that preserve reflections without blocking transmission; independent reflection/transmission layers for downstream compositing
- Tools/products/workflows: RT-Splatting ingest in Blender/Unreal; export reflection-only and transmission-only passes; per-pixel attenuation for art-directable balance; scene editing of tint and roughness
- Dependencies/assumptions: Multi-view footage; thin glass; tone mapping may necessitate color management; GPU resources
- Automotive visualization and digital showrooms
- Sectors: automotive, retail, marketing
- What it enables: Realistic real-time car scans with readable interiors through windows; configurable tint/roughness; sales configurators that keep believable reflections without hiding interiors
- Tools/products/workflows: Web 3D viewer with โreflection strengthโ and โglass tintโ sliders; dealership capture kits
- Dependencies/assumptions: Static vehicle; multi-view capture; thin glass modeling
- AR occlusion and realism near glass
- Sectors: AR/VR, retail, navigation
- What it enables: Consistent reflections and see-through behavior for AR content placed near windows and glass displays; improved occlusion where transmission should remain visible
- Tools/products/workflows: Integrations with AR SDKs (ARKit/ARCore/OpenXR); layer-wise blending using RT-Splattingโs reflection/transmission decomposition
- Dependencies/assumptions: Environment pre-scan; static glass; device-side or edge rendering
- Robotics data generation in glass-heavy environments
- Sectors: robotics, warehouse/logistics, service robots
- What it enables: Realistic training assets for perception in spaces with glass partitions/cabinets; accurate supervision for background geometry visible only through glass without โfloaterโ artifacts
- Tools/products/workflows: Synthetic-to-real pipelines using RT-Splatting reconstructions; generation of reflection-only/transmission-only supervisory signals
- Dependencies/assumptions: Static capture scene; thin glass; multi-view training set
- Inspection and monitoring through enclosures
- Sectors: industrial, pharma/biotech, energy
- What it enables: Digital twins of equipment behind safety glass or acrylic (control panels, gauges) with legible transmission and truthful reflections
- Tools/products/workflows: Periodic scans for change detection; reflection attenuation to enhance readability during review
- Dependencies/assumptions: Thin transparent cover; multi-view access; static or quasi-static targets
- E-commerce capture of packaged goods
- Sectors: retail/e-commerce, CPG
- What it enables: Real-time product viewers for blister packs and clear cases that separate the product (transmission) from protective reflections; adjustable glare for marketing assets
- Tools/products/workflows: RT-Splatting-based capture kit; web viewer with โremove reflectionsโ toggle; batch rendering of reflection-free thumbnails
- Dependencies/assumptions: Thin packaging; controlled capture; GPU inference for batch pipelines
- Forensic and security review enhancement
- Sectors: security, insurance
- What it enables: Reflection/transmission decomposition from multi-view evidence to reduce glare and reveal content behind glass for analysis; consistent layer export for audit trails
- Tools/products/workflows: โDeglareโ viewer using transmission component; configurable attenuation to preserve evidentiary integrity
- Dependencies/assumptions: Multi-view recordings; thin glass; ethical/legal compliance; static or re-enactable scenes
- Photogrammetry through glass for mapping and cultural heritage
- Sectors: GIS/mapping, heritage digitization
- What it enables: Robust reconstructions of exhibits and interiors seen only through display cases or windows; fewer manual masks and less cleanup
- Tools/products/workflows: Replace hand-crafted transparent-object segmentation with mask regularization and gradient gating; publish to web viewers
- Dependencies/assumptions: Multi-view capture; thin covers; static scenes; GPU training
- Photography/post-processing of glare
- Sectors: prosumer photography, media
- What it enables: From a short handheld capture, export reflection-removed renders of subjects behind glass; retain optional reflection layer for stylization
- Tools/products/workflows: Mobile/desktop app offering โreflection-freeโ and โreflection-onlyโ rerenders from a brief sweep of images
- Dependencies/assumptions: Requires multi-view (not single image); thin glass; device or cloud compute
Long-Term Applications
These applications are feasible with further research and engineering (e.g., modeling refraction/multi-bounce, handling dynamics, mobile deployment, or large-scale operations).
- Thick refractive media and multi-bounce transport
- Sectors: underwater inspection, optics, medical imaging, product design
- What it could enable: Accurate rendering/reconstruction of solid glass objects, water tanks, lenses, and curved acrylic; support for refraction and internal scattering beyond thin-surface approximation
- Dependencies/assumptions: Extend RT-Splatting to refractive paths and multi-bounce light; stable optimization with added ambiguity
- Live, on-device AR capture and adaptation
- Sectors: AR/VR, mobile
- What it could enable: On-the-fly reconstruction around glass with reflection/transmission handling directly on phones/headsets
- Dependencies/assumptions: Model compression, hardware acceleration (mobile NPUs/GPUs), fast incremental training/updates
- Dynamic scenes with changing reflections and moving actors
- Sectors: events, sports broadcasting, retail
- What it could enable: Real-time updates as people or lighting move behind/around glass; temporally consistent reflection/transmission layers
- Dependencies/assumptions: Deformable/dynamic GS or hybrid video radiance fields; temporal priors; streaming training
- Autonomous driving: interior understanding through windows
- Sectors: automotive autonomy, ADAS
- What it could enable: Better scene priors for occupants/objects visible through car windows; improved hazard prediction and intent understanding
- Dependencies/assumptions: Robustness to motion, weather, and polarization; fusion with LiDAR/radar; safety and privacy compliance
- Single-image or sparse-view reflection removal via distillation
- Sectors: consumer imaging, journalism, medical imaging through viewports
- What it could enable: Train supervised or distilled models from RT-Splatting decompositions to perform deglare from minimal inputs
- Dependencies/assumptions: Large curated datasets of decomposed pairs; generalization beyond the training capture settings
- Standardized transparency-aware digital twin pipelines
- Sectors: AEC/BIM, smart buildings, manufacturing
- What it could enable: Native support for glass-aware capture/edit/render in CAD/BIM software and facility twins; material-aware editing at scale
- Dependencies/assumptions: SDKs/APIs for RT-Splatting integration; asset standards for storing reflection/transmission layers and factorized opacities
- Advanced robotic manipulation of transparent/reflective objects
- Sectors: logistics, lab automation, household robotics
- What it could enable: Perception stacks trained with transparency-aware renders and extended physics models for grasping glassware or glossy items
- Dependencies/assumptions: Incorporate refraction and contact shading; tactile/vision fusion; domain randomization with transparency controls
- Cloud streaming and edge rendering for large venues
- Sectors: tourism, retail, entertainment
- What it could enable: Interactive, glass-heavy venues streamed with accurate reflections/transmission to lightweight clients
- Dependencies/assumptions: Server-side GPU pools; content delivery for 30โ60 FPS; memory- and bandwidth-aware splat representations
- Governance and ethics for โsee-throughโ reconstructions
- Sectors: policy, compliance, privacy
- What it could enable: Guidelines for scanning private interiors visible through windows; watermarking and disclosure when reflection/transmission are manipulated
- Dependencies/assumptions: Legal frameworks; provenance tooling; user-consent capture workflows
Notes on Feasibility, Assumptions, and Dependencies (cross-cutting)
- Thin-surface approximation: Current method assumes semi-transparent thin surfaces with negligible refraction; not suitable for thick glass, water, or complex internal optics without extensions.
- Data requirements: Multi-view calibrated images; scenes should be mostly static during capture. Background exclusively visible through glass is supported.
- Compute: Training reported ~0.9h on an RTX 4090; real-time rendering ~33 FPS on desktop-class GPUs. Mobile/edge requires optimization.
- Stability aids: Specular-aware gradient gating reduces floaters; SAM2 masks used only as regularization (not hard segmentation).
- Integration: Implemented in PyTorch atop 2DGS; deferred shading pipeline; exportable reflection/transmission layers and material controls (roughness, tint, transmissivity).
- Photometric considerations: Nonlinear camera pipelines (tone-mapping) can affect physically based blends; the learned attenuation term helps match perceptual suppression of transmission under strong highlights.
- Legal/ethical: Applications that โsee throughโ glass (e.g., interiors) must follow privacy laws and consent protocols.
Glossary
- 2D Gaussian Splatting (2DGS): A surface-aligned scene representation using 2D Gaussian โsurfelsโ for accurate geometry and real-time rendering. "2DGS models the scene as a set of 2D Gaussian surfels embedded in 3D space."
- 3D Gaussian Splatting (3DGS): A real-time radiance field method that represents scenes with 3D Gaussian primitives and renders them via rasterization. "3D Gaussian Splatting (3DGS) [18] has revolutionized the field of novel view synthesis"
- alpha blending: A compositing technique that accumulates colors along a ray using per-primitive opacities in front-to-back order. "The final color C for a pixel is then computed by alpha blending the Gaussians in front-to-back order"
- anisotropic 3D Gaussian: A Gaussian with a full covariance (direction-dependent spread) used to model oriented, elongated primitives. "a collec- tion of anisotropic 3D Gaussian primitives"
- attenuation factor: A learned scalar that modulates transmitted/subsurface light based on reflection strength to match perceptual suppression. "output an attenuation factor 3 โฌ [0,1] that directly modulates the subsurface-transport component."
- backpropagation: The gradient-based procedure for training parameters by propagating losses through the rendering pipeline. "During backpropagation, gradients induced by these residuals can be erroneously routed into the transmis- sion branch"
- binary cross-entropy (BCE): A loss function for supervising binary predictions such as masks or opacities. "We then supervise this opacity map with a binary cross-entropy (BCE) loss"
- cone tracing: A rendering approximation that traces cones instead of rays to aggregate reflected features over a region. "NeRF-Casting [36] performs cone tracing along reflection paths"
- deferred shading: A two-pass rendering pipeline that first writes surface attributes to G-buffers and then shades per pixel. "Deferred shading is a two-pass rendering technique that decouples geometry processing from lighting and material computations."
- effective opacity: The product of geometric occupancy and optical opacity used for volumetric compositing. "defines the effective opacity used for volumetric composit- ing"
- environment map: An image-based representation of distant illumination used for efficient specular shading. "shades with a learnable environment map"
- first-surface extraction: Identifying the first surface hit along a ray to aggregate correct per-pixel attributes. "our factorization naturally yields a probabilistic formu- lation for first-surface extraction"
- floaters: Spurious, behind-surface artifacts introduced during optimization that appear as floating geometry. "halluci- nates 'floaters' behind the surface."
- Fresnel equations: Physical laws describing angle-dependent reflection/transmission at interfaces; used here as a reference for blending. "A purely physics-based blend using Fresnel equations is often broken in practice"
- G-buffer: Per-pixel buffers storing geometry/material attributes (e.g., normals, albedo) for deferred shading. "collectively called G-buffers."
- geometric occupancy: The learned probability that a ray interacts with a Gaussian as a surface element. "The geometric occupancy o โฌ [0,1] encodes the probability that a ray interacts with the substance of the Gaussian."
- LPIPS: A learned perceptual image similarity metric used for evaluating reconstruction quality. "We report PSNR, SSIM [39], and LPIPS [48]"
- normal consistency loss: A regularizer that aligns rendered normals with depth-derived gradients to stabilize geometry. "we minimize the normal consistency loss En to enforce geometric alignment"
- occupancy-opacity factorization: Splitting a Gaussianโs role into surface presence (occupancy) and true light attenuation (optical opacity). "Our occupancy- opacity factorization introduces a specific ambiguity"
- optical opacity: The conditional probability that light is absorbed or scattered once a surface interaction occurs. "The optical opacity a โฌ [0, 1] then specifies the conditional probability that the ray is absorbed or scattered once such an interaction occurs."
- rasterization: Projecting and accumulating Gaussian primitives efficiently onto the image plane during rendering. "via rasterization."
- rendering equation: The integral equation governing light transport used in physically based shading. "explicitly evaluates the render- ing equation"
- Spherical Harmonics (SH): A basis for compactly representing view-dependent color on Gaussian primitives. "color represented by Spherical Harmonics (SH)."
- Specular-Aware Gradient Gating: A training mechanism that down-weights transmission gradients in highly specular regions to prevent floaters. "we introduce Specular- Aware Gradient Gating"
- stop-gradient: An operator that blocks gradient flow through a tensor during backpropagation to control optimization paths. "Let sg(ยท) denote the stop- gradient operator."
- subsurface transport: The combined transmitted and internally scattered light component within the material. "into a subsurface- transport component"
- surfel: A surface element (disk-like primitive) representing local surface geometry in 2DGS. "2D Gaussian surfels embedded in 3D space."
- tone-mapping: Nonlinear camera or display mapping that affects physically based blending assumptions. "tone-mapping and other nonlinear camera responses"
- transmissivity: A material property controlling the proportion of light transmitted through a surface. "T dictates the mate- rial's transmissivity"
- volumetric rendering: Accumulating radiance and opacity along rays through a volume to form pixel colors. "like stan- dard volumetric rendering"
Collections
Sign up for free to add this paper to one or more collections.