RetimeGS: Continuous-Time Reconstruction of 4D Gaussian Splatting
Abstract: Temporal retiming, the ability to reconstruct and render dynamic scenes at arbitrary timestamps, is crucial for applications such as slow-motion playback, temporal editing, and post-production. However, most existing 4D Gaussian Splatting (4DGS) methods overfit at discrete frame indices but struggle to represent continuous-time frames, leading to ghosting artifacts when interpolating between timestamps. We identify this limitation as a form of temporal aliasing and propose RetimeGS, a simple yet effective 4DGS representation that explicitly defines the temporal behavior of the 3D Gaussian and mitigates temporal aliasing. To achieve smooth and consistent interpolation, we incorporate optical flow-guided initialization and supervision, triple-rendering supervision, and other targeted strategies. Together, these components enable ghost-free, temporally coherent rendering even under large motions. Experiments on datasets featuring fast motion, non-rigid deformation, and severe occlusions demonstrate that RetimeGS achieves superior quality and coherence over state-of-the-art methods.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
RetimeGS is a new way to turn videos of moving scenes into a 3D world you can view from any angle and at any moment in time—even the moments that weren’t actually captured by the camera. Think slow motion, smooth speed ramps, or “bullet time” shots where you can fly around a scene in mid-action. The paper focuses on making these in‑between frames look clean and natural, without the blurry “ghosts” that often appear with other methods.
Main goal
Make dynamic 3D scenes look good at any time you want to render them, not just at the exact times the cameras recorded, especially when things move fast, bend, or get hidden.
Key questions
- How can we avoid “ghosting” (faint double images) when creating new frames between recorded ones?
- How can we model motion smoothly over time so the results look consistent and natural?
- How can we handle tricky cases like fast motion, stretchy or bendy objects (non‑rigid), or things that appear/disappear due to occlusion?
How does it work?
To explain the method, think of a moving scene as being built from lots of tiny, soft “paint blobs” in 3D space. This idea is called “Gaussian splatting.” Each blob has a position, size, color, and transparency, and all blobs together form the scene when you “splat” them onto the camera image. In 4D, we also care about time—so the blobs can move and fade in/out as the scene changes.
The problem with current methods
Many existing systems only learn to match the exact frames they were given (like frame 1 and frame 2), not the moments in between (like frame 1.5). When you ask them to show an in‑between moment, you can get “ghosting”—two faint versions of the same object overlapping—because the system essentially memorized the originals and didn’t learn how to blend them smoothly over time. This is a kind of “temporal aliasing” (like choppy, jittery motion in time).
The core idea of RetimeGS
RetimeGS teaches each blob:
- When to appear and disappear (so it doesn’t live too long or too short in time).
- How to move smoothly along a curve between two frames.
- How to share responsibility with neighboring blobs so they jointly cover the time span cleanly.
In simple terms: instead of blobs clinging to the exact frames, RetimeGS makes them fade in, move smoothly, and fade out over the interval between two frames. That way, the system naturally knows what should happen at the in‑between moments.
Training tricks that make it work
To make this practical and stable, RetimeGS adds several helpful strategies:
- Optical-flow‑guided motion (smooth curves, not straight lines)
- Optical flow is like drawing tiny arrows on each pixel to show where it moves from one frame to the next.
- The method uses these arrows (forward and backward in time) to guide each blob’s path through 3D space.
- Instead of moving in straight lines, blobs follow a smooth curve (a Catmull–Rom spline—think of bending a flexible ruler through a few guiding points). This helps avoid jerky, piecewise motion.
- Short, controlled “time visibility” for each blob
- Each blob is designed to be active primarily between two neighboring frames, fading in and out with a smooth “sigmoid” shape.
- This keeps blobs from stretching across too much time (which would require perfect tracking) but still ensures they’re present across the gap so the in‑between frames render cleanly.
- Triple rendering supervision
- At a recorded frame (say frame i), the system:
- 1) renders the image using all blobs together,
- 2) renders using only the blobs responsible for the interval [i−1, i],
- 3) renders using only the blobs responsible for [i, i+1].
- All three results are trained to match the real image. This prevents one group from “slacking off” and relying on the other, making sure each group can explain the frame by itself—leading to better in‑between frames.
- Dynamic stretching and smart relocation
- If a region is static (doesn’t move), the same blobs are allowed to “last longer” across multiple frames so we don’t waste extra blobs representing the same thing again and again.
- If some blobs aren’t pulling their weight (too faint or unhelpful), they can be “moved” to busier, harder parts of the scene—like sending extra staff to where the crowd is.
- Flow‑aware initialization
- The system starts with a rough 3D point cloud and uses optical flow to give blobs decent initial motion guesses. Good starting points help training converge faster and more reliably.
What did they find?
- Cleaner in‑between frames: RetimeGS reduces ghosting and blurriness when making new frames between recorded ones, even with fast motion, bending clothes, changing visibility (like hands emerging from sleeves), and complex textures.
- Smoother motion: Using spline curves and flow supervision makes objects move more naturally over time.
- Better numbers and visuals: On a challenging multi-camera dataset, RetimeGS scores higher on common quality measures (PSNR, SSIM) and lower on a perceptual error measure (LPIPS) compared to top alternatives. Qualitative examples show clearer hands, sleeves, and moving props, with fewer double images.
Why is this important?
- Films and VFX: It helps create smooth slow motion and speed ramps without the weird artifacts that break the illusion.
- VR and AR: Higher frame-rate rendering without flicker or ghosting can make virtual experiences more comfortable and realistic.
- Content creation: Easier retiming and cleaner motion edits make post‑production faster and more flexible.
Limitations and future directions
- If the video has extremely low frame rates (big jumps between frames), even optical flow struggles to guess motion correctly. In those cases, inferring the “in‑between” becomes very hard, and this method can still falter.
- Future work could focus on better motion cues or learning from additional signals when frames are very sparse.
Summary in one sentence
RetimeGS teaches 3D “blobs of color” how to appear, move smoothly, and disappear over short time spans—guided by motion arrows and smart training tricks—so it can render clean, ghost‑free frames at any moment between the ones a camera actually recorded.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a single, concrete list of what remains missing, uncertain, or unexplored in the paper, phrased to guide future research:
- Dependence on optical flow quality: The method relies heavily on WAFT-estimated bidirectional flows and does not model flow uncertainty, occlusion masks, or confidence weighting, leaving open how to robustly supervise trajectories under noisy, inconsistent, or occluded flow.
- Scene-flow consistency: Flow supervision is per-view 2D and projected to/from 3D, but multi-view consistency of flow is not enforced; integrating multi-view scene flow or epipolar consistency could reduce depth/ambiguity errors.
- Extremely low-FPS regimes: The authors acknowledge failures when inter-frame motions are very large; strategies for multi-start-end interpolation, learned priors, or scene-flow-based constraints for such regimes are unaddressed.
- Sensitivity to camera calibration: Robustness to imperfect intrinsics/extrinsics (no bundle adjustment) is not evaluated; the method may degrade when calibration is noisy or drifted.
- Synchronized capture assumption: Handling of asynchronous cameras or rolling shutter effects is not considered; retiming under real production capture artefacts remains open.
- Sparse-view generalization: The approach is validated with 32–60 cameras; performance with few views (e.g., 4–8 cameras) and the minimal view/time sampling for acceptable interpolation quality are unexplored.
- Long sequences and scalability: Experiments use 17-frame clips; stability, drift, and continuity over long sequences (hundreds/thousands of frames) and memory/time scaling are not reported.
- Training/inference efficiency: There is no analysis of training time, GPU memory, convergence behavior, or inference throughput versus baselines, despite claims targeting VR/high-FPS use cases.
- Continuous-time rendering fidelity: While interpolation within input intervals is targeted, extrapolation beyond the captured range is unsupported; behavior near sequence boundaries (where one sigmoid is clamped to 1) is not analyzed for artifacts.
- Temporal aliasing theory: The work identifies temporal aliasing but provides no formal analysis or guarantees for the proposed sigmoid temporal opacity; trade-offs between temporal bandwidth, bias, and variance are not quantified.
- Choice and sensitivity of temporal kernel: The sigmoid-tail parameter γ is fixed and not learned; its impact on ghosting, temporal smoothness, and motion magnitude is unstudied, and adaptive/learned temporal support is not explored.
- Segment stitching and continuity: Dynamic primitives are defined per-interval with blending across adjacent groups; guarantees of C0/C1 continuity across intervals (in position, opacity, color) and prevention of identity switches or temporal popping are not provided.
- Rotation modeling limits: Quaternion rotation is a low-order polynomial in time; behavior under complex accelerations or non-linear rotations over larger intervals is untested, and higher-order or spline-based rotations are not explored.
- Appearance dynamics and illumination: SH coefficients are time-invariant (except view dependence); time-varying materials, lighting, shadows, and specularities are not modeled, limiting scenes with dynamic illumination.
- Topology and geometry changes: Although visibility changes are handled via temporal opacity, explicit handling of topology changes (splits/merges, cloth self-collisions) and dynamic primitive birth/death policies beyond relocation are not formalized.
- Densification policy in 4D: The paper does not describe or ablate a dynamic densification/splitting schedule (common in 3DGS) adapted for 4D, leaving unclear how to refine primitives when motion/appearance complexity increases.
- Flow-to-trajectory supervision design: The triple rendering and flow normalization (dividing by στ(ti)) are heuristics; a principled compositing model or proof that this preserves energy/opacity across subsets is absent.
- Occlusion-aware supervision: There is no explicit treatment of occluded pixels in flow or RGB losses (e.g., occlusion masks, visibility-aware weighting), which may bias trajectories in heavily occluded regions.
- Initialization dependence: The pipeline depends on VGGT point clouds without BA; the sensitivity to point-cloud noise, outliers, or sparse coverage and whether end-to-end learnable initialization could help is not investigated.
- Static vs dynamic classification: Stretching τl/τr uses SH0 similarity and near-zero velocity thresholds; the thresholds, failure modes (e.g., slowly moving objects), and sensitivity analyses are not provided.
- Parameter learning of temporal extent: τl/τr are non-optimizable at init and periodically stretched; learning temporal duration end-to-end or with priors (e.g., sparsity or MDL) remains unexplored.
- Effectiveness of dynamic stretching and pruning: The pruning probability rule (1 − 1/(k+1)) lacks ablation on quality–complexity trade-offs, stability, or risk of removing useful primitives.
- Relocation score design: The relocation score s = σ/(τl + τr) is ad hoc; alternative scores that incorporate photometric error, motion magnitude, or uncertainty are not evaluated.
- Robustness across content types: Experiments include human-centric scenes with complex cloth; performance on highly reflective, transparent, or thin-structure scenes, or dense crowds with heavy occlusions, is not assessed.
- Metrics and evaluation breadth: Beyond PSNR/SSIM/LPIPS on foreground masks, temporal metrics (e.g., tLPIPS, temporal warping error), motion continuity, and flicker are not evaluated; user studies for slow-motion/VR comfort are missing.
- Failure case taxonomy: Aside from very low FPS, the paper lacks a systematic analysis of failure modes (e.g., fast rotations, specularity, occlusion cascades) to guide when to prefer this method or integrate priors.
- Integration with learned priors: Combining the explicit 4DGS model with diffusion-based VFI/novel-view priors for extreme motions or visibility gaps is not explored.
- Generalization to monocular or weakly supervised settings: The method assumes dense multi-view RGB; extensions to monocular video, sparse views, or partially calibrated rigs remain open.
- Theoretical and practical bounds: There is no characterization of the maximal inter-frame displacement, acceleration, or scene complexity the method can handle given view/time sampling, leaving practitioners without design guidelines.
Practical Applications
Immediate Applications
Below are actionable use cases that could be deployed today with the paper’s methods, given typical multi‑view capture setups and offline training workflows.
- Media & Entertainment (Film/TV/VFX): Ghost‑free slow‑motion, speed ramps, and bullet‑time from stage captures
- Potential tool/workflow: “RetimeGS Retimer” plugin for Nuke/Unreal/Unity that ingests synchronized multi‑view footage, estimates optical flow per view (e.g., WAFT), trains RetimeGS, and renders arbitrary timestamps for post-production.
- Why RetimeGS: Continuous-time interpolation with short‑tailed temporal opacity and spline‑based trajectories avoids ghosting under large motion and visibility changes; triple rendering ensures each interval subset explains its frame.
- Dependencies/assumptions: Synchronized multi‑camera capture with good calibration; decent per‑view flow quality; offline GPU training time; works best at moderate capture FPS (e.g., 15–22 FPS).
- Virtual Production Stages (On‑set preview and post): Reliable retiming during and after shoots
- Potential tool/workflow: On‑set preview module that trains with a subset of views for coarse retiming, then refines with full data offline for final shots.
- Why RetimeGS: Robust interpolation across large motions and occlusions reduces the need for reshoots.
- Dependencies/assumptions: Stage capture rigs; pre-calibrated cameras; acceptable training latency.
- XR/VR Playback (Immersive Media): Frame‑rate up‑conversion of volumetric video to 60–120 Hz
- Potential tool/workflow: A 4D Gaussian viewer that samples RetimeGS at the HMD’s native refresh rate to reduce motion judder and simulator sickness.
- Why RetimeGS: Continuous-time 4D assets produce smooth motion for high-FPS HMDs.
- Dependencies/assumptions: Precomputed 4D assets; XR runtime integration; content captured with multiple cameras.
- Sports Broadcasting & Live Events: Free‑viewpoint, slow‑motion replays from multi‑camera arrays
- Potential tool/workflow: Stadium pipeline that reconstructs RetimeGS assets for key moments and renders slow‑motion, viewpoint‑agile replays.
- Why RetimeGS: Handles fast motion, non‑rigid deformation, and visibility changes without double‑images.
- Dependencies/assumptions: Existing multi‑camera infrastructure; offline or near‑offline processing; high‑quality calibration/flow.
- Digital Human/Avatar Asset Creation (Games/Ads): High‑quality 4D capture for animation references and in‑engine playback
- Potential tool/workflow: Export RetimeGS to engine‑friendly formats (e.g., 4DGS, mesh sequences, or point clouds) with editable Catmull–Rom trajectories.
- Why RetimeGS: Temporally coherent, ghost‑free sequences give cleaner references and content‑ready assets.
- Dependencies/assumptions: Multi‑view capture; offline optimization; conversion tools.
- Telepresence (Recorded Sessions): Smooth retime and playback of recorded multi‑view meetings or performances
- Potential tool/workflow: Post‑processed “volumetric meeting” with time‑scrubbing and high‑FPS playback on clients.
- Why RetimeGS: Arbitrary timestamps and coherent interpolation improve viewing comfort and editing flexibility.
- Dependencies/assumptions: Recorded, synchronized views; privacy/consent workflows; not yet live/low‑latency.
- Academic Research & Dataset Generation: Continuous‑time benchmarks for dynamic scene reconstruction and VFI
- Potential tool/workflow: Use RetimeGS to generate dense intermediate frames and ground‑truth flows/trajectories for evaluating 4D methods under large motions.
- Why RetimeGS: Mitigates temporal aliasing, providing cleaner supervision signals.
- Dependencies/assumptions: Access to multi‑view datasets; compute resources.
- Robotics/Autonomy (Data Curation & Simulation): High‑fidelity 4D scenes for training perception under dynamic conditions
- Potential tool/workflow: Use stage or lab captures to create richly annotated, time‑dense 4D sequences simulating dynamic obstacles or human interactions.
- Why RetimeGS: Continuous-time control enables arbitrarily dense temporal sampling.
- Dependencies/assumptions: Multi‑view capture pipeline; domain shift from staged to real-world remains.
- Educational/Biomechanics Labs: Motion analysis with artifact‑free temporal interpolation
- Potential tool/workflow: Lab pipeline that converts multi‑view recordings into continuous‑time 4D assets for teaching and analysis (e.g., gait, sports movement).
- Why RetimeGS: Spline trajectories and flow supervision yield smoother, more accurate motion depiction.
- Dependencies/assumptions: Multi‑camera lab setups; ethical handling of subject data.
Long‑Term Applications
These require further research, scaling, or productization beyond the current assumptions (e.g., synchronized, dense multi‑view capture and offline training).
- Live, Low‑Latency Telepresence (Communications): Near real‑time continuous‑time 4D streaming
- Potential tool/workflow: Edge/cloud system that incrementally reconstructs RetimeGS or a feed‑forward variant and streams parameters to clients for high‑FPS playback.
- Research gaps/dependencies: Faster training/inference; robust per‑view flows in real time; low‑latency networking; incremental, online optimization.
- Consumer‑Grade Capture (Daily Life): Multi‑phone, asynchronous capture with automatic time alignment
- Potential tool/workflow: Mobile app that fuses unsynchronized videos from several phones, estimates flows/camera poses, and produces a continuous‑time 4D memory.
- Research gaps/dependencies: Stronger correspondence under large temporal offsets; joint sync/calibration; priors for missing views; robust to low FPS and motion blur.
- Motion Editing & Authoring (DCC Tools): Trajectory‑aware time warping, mixing, and motion stylization in 4D
- Potential tool/workflow: DCC plugin exposing Catmull–Rom control handles and temporal opacity curves for non‑destructive retiming and motion edits.
- Research gaps/dependencies: Stable, user‑friendly parameterization; constraints for physical plausibility; interoperability with mesh/rig workflows.
- Hybrid Sensors for Ultra‑Low FPS Capture: Combining RGB with event cameras or IMUs
- Potential tool/workflow: Fusion system using event streams to stabilize trajectories and recover fine motion when frame rates are too sparse for optical flow.
- Research gaps/dependencies: Cross‑modal calibration; event‑to‑RetimeGS fusion models; robust handling of rolling shutter and clock drift.
- Standards & Interoperability (Policy/Industry Consortia): Exchange formats for continuous‑time 4D assets
- Potential tool/workflow: Open specification for 4D Gaussian assets including temporal opacities, spline trajectories, and training metadata to ease interchange across engines and tools.
- Research gaps/dependencies: Community consensus; backward compatibility with existing volumetric/point‑based formats; IP/privacy guidelines.
- Medical/Healthcare (Rehab/Diagnostics): Clinic‑friendly dynamic 3D capture and analysis
- Potential tool/workflow: Compact capture pods producing continuous‑time 4D reconstructions for therapists to analyze movement at arbitrary temporal scales.
- Research gaps/dependencies: Lower‑cost, privacy‑preserving multi‑view hardware; clinical validation; regulatory approval; streamlined workflows.
- Robotics & Autonomous Systems (Online World Models): Continuous‑time dynamic scene understanding from sparse sensors
- Potential tool/workflow: On‑robot 4D scene models that interpolate motion between sparse camera frames for planning and prediction.
- Research gaps/dependencies: Extension from dense multi‑view to monocular or few‑view, online SLAM integration, robust flow under high egomotion.
- E‑commerce/Virtual Try‑On (Marketing/Retail): Photoreal dynamic garments with consistent retiming
- Potential tool/workflow: Capture and retime dynamic clothing on models for interactive, time‑scrubbable product pages.
- Research gaps/dependencies: Scalable, cost‑effective capture; handling thin structures and fabric dynamics; integration with 3D web viewers.
- Cultural Heritage & Museums (Public Engagement): Interactive, time‑scrubbable 4D exhibits
- Potential tool/workflow: Installations where visitors explore dynamic performances in space and time.
- Research gaps/dependencies: Simplified capture pipelines; robust automation for non‑expert operators; long‑term asset preservation formats.
Cross‑Cutting Assumptions and Dependencies
- Data capture: Current method assumes synchronized, calibrated multi‑view videos with sufficient frame rate; quality degrades at extremely low FPS where optical flow fails.
- Computation: Offline training on modern GPUs (e.g., RTX 4090‑class) is assumed; live workflows require further acceleration.
- Algorithms: High‑quality per‑view optical flow (e.g., WAFT) and camera poses (e.g., VGGT) are prerequisites; failure cases include severe occlusions and large inter‑frame gaps.
- Integration: Tooling to export/import 4D Gaussian assets into engines and DCCs is needed for broad adoption; streaming requires client runtimes capable of splat‑based rendering.
- Governance: For telepresence/health/retail, privacy, consent, and data security policies are necessary for deployment.
Glossary
- 3D Gaussian Splatting (3DGS): A real-time 3D scene representation that models surfaces as Gaussian primitives with position, scale, rotation, color, and opacity, rendered via splatting. "The original 3DGS primitives are represented by ."
- 4D Gaussian Splatting (4DGS): An extension of Gaussian splatting to dynamic scenes that vary over time, enabling spatiotemporal rendering. "most existing 4D Gaussian Splatting (4DGS) methods overfit at discrete frame indices"
- Alpha compositing: A blending process that combines ordered, partially transparent layers to produce the final image. "projected, depth-sorted, and alpha-composited to render the final image at time ."
- Anisotropic scale: Direction-dependent scaling parameters that shape a Gaussian’s covariance differently along different axes. "The parameter represents the anisotropic scale"
- Back-projection: Mapping image-plane measurements back into 3D space using camera geometry. "The 2D flows from all views are then back-projected to 3D and averaged"
- Bidirectional optical flow: Forward and backward per-pixel motion fields between adjacent frames used jointly for supervision. "whose parameters are supervised by bidirectional optical flow."
- Bilinear interpolation: A grid-based interpolation method that blends values from four neighboring pixels. "are bilinearly interpolated."
- Bundle adjustment: A joint optimization of camera parameters and 3D structure to minimize reprojection error. "we use VGGT without bundle adjustment"
- Canonical space: A reference configuration in which scene geometry and appearance are defined before being deformed over time. "models scene geometry and appearance within a canonical space"
- Catmull–Rom spline: An interpolating spline defined by control points, used here to model smooth, non-linear 3D trajectories. "we model the trajectory across this interval with a CatmullâRom spline"
- Control points: Key points that a spline interpolates through, determining the curve’s shape and continuity. "The inner control points, which the spline interpolates exactly, correspond to the positions at and ."
- Covariance: The second-moment matrix of a Gaussian that defines its spatial extent and orientation. "is the time-varying covariance obtained by rotating and scaling the base Gaussian of primitive "
- Deformation fields: Functions that warp points from a canonical space to target configurations over time. "leveraging deformation fields to capture dynamics."
- Depth sorting: Ordering primitives by depth before compositing to ensure correct visibility. "projected, depth-sorted, and alpha-composited"
- Dynamic stretching: Extending a primitive’s temporal support when it represents static content to reduce redundancy. "We illustrate the effectiveness of our dynamic stretching"
- Ghosting artifacts: Unwanted semi-transparent duplicates or overlaps that appear when temporal representations misalign in interpolation. "leading to ghosting artifacts when interpolating between timestamps."
- Hyperparameter: A training-time parameter set by the experimenter that controls model behavior (e.g., smoothness). " is a hyperparameter controlling the smoothness of temporal transitions."
- Linearity bias: The tendency of a method (e.g., flow) to favor linear motion assumptions, potentially misrepresenting curved trajectories. "This design eliminates the linearity bias inherent to optical flow"
- Low-pass filter: A filter that suppresses high-frequency variations; used here to widen temporal support and reduce aliasing. "apply a low-pass filter to the temporal opacity"
- MCMC strategy: A Markov Chain Monte Carlo approach used to stochastically relocate or refine primitives during training. "We adopt the MCMC strategy to our representation."
- Mip-Splatting: A multiscale, alias-reducing extension of Gaussian splatting analogous to mipmapping in rasterization. "Analogous to 3D Mip-Splatting~\cite{yu2024mip}, which addresses the problem of spatial aliasing"
- Monocular 4D reconstruction: Reconstructing dynamic 3D content over time from a single-view video. "they are tailored to monocular 4D reconstruction."
- Non-rigid deformations: Motion or shape changes not captured by rigid-body transformations. "non-rigid deformations"
- Occlusions: Visibility changes where objects become hidden by others along the line of sight. "severe occlusions"
- Optical flow: Per-pixel apparent motion between consecutive frames used for correspondence and supervision. "optical flow-guided initialization and supervision"
- Parametric distributions: Probability distributions described by parameters (e.g., Gaussians) used to model temporal support. "or other parametric distributions such as constant temporal window with Gaussian fall-off at the boundaries"
- Periodic relocation: Regularly moving low-contribution primitives toward regions needing more capacity. "periodic relocation strategy"
- PSNR: Peak Signal-to-Noise Ratio; an image quality metric measuring pixel-wise fidelity. "PSNR (pixel-level error)"
- Pseudo spatial mean: A parameter estimating a primitive’s spatial position at the midpoint between two frames under linear motion. " denotes the pseudo spatial mean"
- Quaternion: A four-parameter representation for 3D rotations avoiding gimbal lock, used for Gaussian orientation. " is a quaternion denoting the rotation."
- Rasterization: Converting geometric primitives into pixel-space contributions; here used to form flow maps. "to rasterize backward and forward flow maps"
- Sampling score: A priority metric combining opacity and temporal duration to guide relocation. "higher sampling scores"
- Spherical harmonics: A set of basis functions on the sphere used to model view-dependent color. "The coefficients correspond to the spherical harmonics used for color representation"
- SSIM: Structural Similarity Index Measure; a perceptual image similarity metric based on luminance, contrast, and structure. "SSIM~\cite{wang2004image} (perceptual similarity based on luminance, contrast, and structure)"
- Temporal aliasing: Artifacts arising when temporal variations are under-sampled, causing misrepresentation between frames. "We identify this limitation as a form of temporal aliasing"
- Temporal opacity: A time-dependent gating function that controls when a primitive appears or fades. "The temporal opacity is formulated as the product of two sigmoid functions"
- Temporal retiming: Reconstructing and rendering dynamic scenes at arbitrary timestamps. "Temporal retiming, the ability to reconstruct and render dynamic scenes at arbitrary timestamps, is crucial"
- Triple rendering: A training strategy that supervises renderings from all primitives and from each temporal subset separately. "triple-rendering supervision"
- VGGT: A learned model used here to estimate per-frame point clouds and align them to the camera coordinate system. "we use VGGT without bundle adjustment"
- WAFT: An off-the-shelf optical flow estimator providing multi-view forward and backward flows. "derived from off-the-shelf WAFT"
Collections
Sign up for free to add this paper to one or more collections.
