Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications (2510.13978v1)

Published 15 Oct 2025 in cs.CG

Abstract: We present Instant Skinned Gaussian Avatars, a real-time and cross-platform 3D avatar system. Many approaches have been proposed to animate Gaussian Splatting, but they often require camera arrays, long preprocessing times, or high-end GPUs. Some methods attempt to convert Gaussian Splatting into mesh-based representations, achieving lightweight performance but sacrificing visual fidelity. In contrast, our system efficiently animates Gaussian Splatting by leveraging parallel splat-wise processing to dynamically follow the underlying skinned mesh in real time while preserving high visual fidelity. From smartphone-based 3D scanning to on-device preprocessing, the entire process takes just around five minutes, with the avatar generation step itself completed in only about 30 seconds. Our system enables users to instantly transform their real-world appearance into a 3D avatar, making it ideal for seamless integration with social media and metaverse applications. Website: https://sites.google.com/view/gaussian-vrm

Summary

The paper presents a system that binds Gaussian splats to a skinned mesh for real-time avatar animation while preserving high visual fidelity.
It employs a single-camera scan with minimal preprocessing and achieves 40–50 fps on mobile devices through parallel splat updates.
The browser-based implementation leverages Three.js and the VRM mesh format, enabling seamless deployment on web, mobile, and VR platforms.

Instant Skinned Gaussian Avatars: A Technical Analysis

Introduction

The paper "Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications" (2510.13978) introduces a real-time, cross-platform system for generating and animating photorealistic 3D avatars using Gaussian Splatting. The approach is designed for accessibility, enabling avatar creation and animation on commodity hardware, including smartphones and web browsers, with minimal preprocessing time. The system addresses key limitations of prior Gaussian Splatting-based avatar methods, such as the need for multi-camera setups, extensive preprocessing, or high-end GPUs, while maintaining high visual fidelity and real-time performance.

Technical Contributions

Efficient Splat-Mesh Binding and Animation

The core technical contribution is a pipeline that binds Gaussian splats to a skinned background mesh, enabling the splats to follow mesh deformations in real time. The process consists of:

Acquisition: A single-camera video is captured using a mobile 3D scanning app (Scaniverse), with the subject in an A-pose to facilitate downstream processing.
Filtering and Normalization: Non-subject splats are removed via rule-based spatial filtering, and the remaining splats are normalized in position and scale.
Pose Estimation and Mesh Alignment: The subject's pose and orientation are estimated, and a neutral VRM-format mesh is aligned accordingly.
Splat-to-Vertex Assignment: Each splat is assigned to its nearest mesh vertex using a nearest-neighbor search, and the relative transformation between the splat and the vertex is stored.
Data Export: The system outputs the pose, scale, vertex indices, and relative transformations for use at runtime.

At runtime, the mesh is animated (e.g., via motion capture or procedural animation), and each splat's position is updated in parallel according to the current transformation of its associated mesh vertex. This per-splat parallelism is critical for achieving real-time performance on both mobile and desktop hardware.

Web and Mobile Implementation

The entire system, excluding the initial scan, is implemented as a browser application using JavaScript and Three.js. The use of the VRM mesh format ensures compatibility with existing avatar and VRM-based applications. The pipeline is fully automated, requiring no manual intervention after the initial scan.

Performance Metrics

Preprocessing Time: Full avatar creation, including scanning, is completed in approximately five minutes on an iPhone 13 Pro; the avatar generation step itself takes about 30 seconds.
Runtime Performance: The system achieves 40–50 fps on an iPhone 13 Pro and up to 240 fps (display-limited) on a laptop with an NVIDIA GeForce RTX 3060 GPU.
Visual Fidelity: The method preserves the photorealism of Gaussian Splatting, avoiding the fidelity loss associated with mesh conversion approaches.

Comparison with Prior Work

Previous methods for animating Gaussian Splatting avatars, such as "Animatable Gaussians" [li2024animatablegaussians], "3DGS-Avatar" [qian20233dgsavatar], and "GauHuman" [GauHuman], often require multi-view capture, lengthy optimization, or high-end hardware. ExAvatar [moon2024exavatar] can generate avatars from a single video but requires hours of preprocessing and relies on SMPL meshes, which reduce geometric detail. Mesh conversion approaches improve efficiency but at the cost of visual quality.

The presented system distinguishes itself by:

Single-View, Fast Pipeline: Only a single video is required, and the entire process is completed in minutes on consumer hardware.
No Mesh Conversion: The method retains the expressive power of Gaussian Splatting, avoiding the artifacts and loss of detail inherent in mesh-based simplifications.
Real-Time, Cross-Platform: The browser-based implementation enables deployment on a wide range of devices, including smartphones and VR headsets, without specialized hardware.

Implementation Details

Splat-to-Vertex Assignment

The assignment of splats to mesh vertices is performed via a nearest-neighbor search in 3D space. For each splat $s_i$ with position $\mathbf{p}_i$ , the nearest mesh vertex $v_j$ is found by minimizing $\|\mathbf{p}_i - \mathbf{v}_j\|_2$ . The relative transformation (offset and orientation) between $s_i$ and $v_j$ is stored for use during animation.

Parallel Splat Updates

At runtime, the mesh is deformed according to animation parameters (e.g., skeletal pose). For each frame, the position and orientation of each splat are updated in parallel:

// Pseudocode for per-frame splat update
for (let i = 0; i < numSplats; ++i) {
    let vertexIdx = splatToVertex[i];
    let meshTransform = mesh.getVertexTransform(vertexIdx);
    splat[i].position = meshTransform.apply(splatRelativeOffset[i]);
    splat[i].orientation = meshTransform.apply(splatRelativeOrientation[i]);
}

This parallelism is well-suited to GPU or multi-core CPU execution, and the use of JavaScript with Three.js allows for efficient WebGL-based rendering.

Integration and Deployment

The use of the VRM mesh format and Three.js enables seamless integration with existing VR, AR, and web-based avatar systems. The system is compatible with WebXR, facilitating deployment in metaverse and social media applications.

Limitations and Trade-offs

Pose Generalization: The method relies on the accuracy of the initial pose estimation and mesh alignment. Large deviations from the A-pose or significant non-rigid deformations may reduce fidelity.
Splat Assignment Granularity: Assigning each splat to the nearest mesh vertex may introduce artifacts in regions with sparse mesh resolution or high curvature.
Background Mesh Expressiveness: The use of a neutral VRM mesh may limit the expressiveness of body shapes and fine-grained deformations compared to subject-specific meshes.

Implications and Future Directions

The system demonstrates that high-fidelity, animatable avatars can be generated and deployed in real time on commodity hardware, significantly lowering the barrier to entry for photorealistic avatar creation. This has direct implications for virtual social interaction, telepresence, and digital twin applications, where rapid and accessible avatar generation is critical.

Future research directions include:

Improved Splat-Mesh Correspondence: Leveraging learned correspondences or denser mesh templates to improve deformation fidelity.
Dynamic Splat Reassignment: Allowing splats to dynamically rebind to mesh vertices to better handle extreme deformations.
Personalized Meshes: Integrating subject-specific mesh estimation to improve anatomical accuracy and expressiveness.
End-to-End Mobile Pipelines: Further optimizing the pipeline for on-device execution, including the initial scan and pose estimation stages.

Conclusion

"Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications" presents a practical and efficient system for real-time, photorealistic avatar creation and animation using Gaussian Splatting. By binding splats to a skinned mesh and leveraging parallel per-splat updates, the method achieves high visual fidelity and real-time performance on commodity devices. The approach addresses key limitations of prior work and opens new avenues for accessible, high-quality avatar generation in consumer applications.

PDF Markdown

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

Overview

This paper introduces Instant Skinned Gaussian Avatars, a fast and easy way to turn a short smartphone scan of a person into a realistic 3D avatar that moves in real time. It works on the web, mobile phones, and VR headsets, and the whole process takes about five minutes, with the actual avatar creation taking roughly 30 seconds.

Key Objectives

The researchers set out to:

Make high-quality 3D avatars quickly and easily from a single phone scan or video.
Animate these avatars smoothly in real time without needing expensive computers or long waiting times.
Keep the avatar looking very realistic, even while it moves.
Ensure the system works across web browsers, phones, and VR/AR apps.

Methods and Approach (explained simply)

To understand the method, here are the key ideas in everyday language:

Gaussian splatting: Imagine a 3D picture made from thousands of tiny, soft, semi-transparent dots—like confetti or paint dabs floating in space. Together, they form a detailed scene or person. Each dot is called a “splat.”
Mesh and skinned mesh: A mesh is like a wireframe mannequin made of connected points. A skinned mesh is that mannequin with an invisible skeleton, like a puppet with bones, so you can pose and animate it.
Binding splats to the mesh: Think of sticking each confetti dot to the closest point on the mannequin. When the mannequin moves (waves an arm, turns the head), the dots follow naturally.
Parallel processing: Instead of moving the dots one by one, the system moves lots of them at the same time—like many people working together to quickly shift all the stickers.
Sorting splats for the viewer: To make the picture look right from any angle, the system reorders the dots every frame so transparency layers look correct—like stacking semi-transparent stickers from back to front.

Here’s the simplified workflow they use:

Scan the person: They use a phone app (Scaniverse) to capture the person in an “A-pose” (standing straight with arms slightly out) and create a 3D model made of splats.
Clean the scan: They remove splats that aren’t part of the person (like background), then put everything into a consistent size and position.
Match a mannequin: The system estimates where the person is facing and how they’re posed, then places a standard 3D mannequin (a VRM avatar mesh) to line up with the scanned person.
Attach splats: Each splat is “bound” to the nearest point (vertex) on the mannequin, and the system records how the splat is positioned relative to that point (so it knows how to move with it later).
Save movement rules: It stores the pose, scale, and bindings so the avatar can be animated instantly later—no extra heavy processing needed.

At runtime (when you use the avatar), the system animates the mannequin and updates all the splats in parallel every frame, then re-sorts them based on the viewer’s perspective for a clean, realistic look.

Main Findings and Why They Matter

Speed: The full process—from scanning to a moving avatar—takes around five minutes on a smartphone, with the avatar generation itself about 30 seconds.
Real-time performance: It runs smoothly at about 40–50 frames per second on an iPhone 13 Pro and up to 240 fps on a laptop with an RTX 3060 GPU (capped by the screen’s refresh rate).
High visual quality: Because it keeps the splats rather than converting them to simpler shapes, the avatar looks more detailed and realistic.
Accessibility: It works right in a web browser (using JavaScript and Three.js), so it’s easy to share and use in web, AR, and VR apps (via WebXR). It also uses the common VRM avatar format, which helps it plug into existing avatar ecosystems.

These results matter because they remove the usual barriers—like needing many cameras, long preprocessing times, or high-end PCs—to making realistic, animated avatars. This means more people can create and use photoreal avatars quickly.

Implications and Potential Impact

This research could make realistic, moving 3D avatars as easy to create as posting a photo. That’s useful for:

Social media and the metaverse: Instant, lifelike avatars for profiles, chats, and virtual events.
VR/AR experiences: Quick setups for games, virtual meetings, and live performances.
Digital twins and professional settings: Photoreal avatars for training, remote work, or formal virtual presentations.

By making high-quality avatars fast, affordable, and cross-platform, Instant Skinned Gaussian Avatars could help bring more authentic, human-looking presence to virtual spaces—without the tech hassle.

View Paper Prompt View All Prompts

Knowledge Gaps

Below is a single, concrete list of the paper’s knowledge gaps, limitations, and open questions that remain unresolved and actionable for future research:

The Animation section is empty; key mechanics for updating Gaussian parameters (mean, covariance/scale, rotation, opacity, SH features) under skeletal motion are unspecified.
No description of the rotation/skin deformation representation (e.g., linear blend skinning vs dual-quaternion) used to transform anisotropic Gaussians without shearing artifacts.
Binding strategy is limited to nearest-vertex assignment; lack of multi-vertex skinning weights, neighborhood smoothing, or regularization likely causes artifacts under large deformations.
No analysis of topology and self-occlusion handling (e.g., crossed arms), interpenetration, holes, and tearing introduced by per-splat nearest-vertex binding.
Absent quantitative evaluation: no fidelity metrics (e.g., PSNR/SSIM/LPIPS), silhouette/geometry accuracy, temporal stability, or user perception studies; no comparisons with recent SOTA (e.g., ASH, Drivable 3DGS, ExAvatar).
No ablations on binding granularity (k-NN vs 1-NN), smoothing, splat density, or the effect of per-splat transform choices on quality/performance.
Per-frame resorting strategy is not described (algorithm, complexity, GPU/CPU path); scalability to high splat counts on WebGL/WebGPU is unclear.
Sorting every frame (nominally O(N log N)) may be a bottleneck on mobile; no LOD, per-tile binning, or approximate order-independent blending alternatives are proposed.
Culling/acceleration structures (frustum/occlusion culling, occupancy grids) are not addressed; impact on performance and memory is unknown.
No details on parallelization primitives in the browser (WebGL vs WebGPU, transform feedback/compute) and their portability across Safari/Chrome/Android.
Memory footprint and scalability are unspecified: splat counts, per-splat data size (index + relative transform + appearance), total VRAM/RAM usage, and compression/streaming.
Power/thermal behavior and sustained frame rate on smartphones/standalone VR headsets are unmeasured; battery drain and throttling remain unknown.
Preprocessing step 3 relies on pose estimation, but the method, accuracy, failure modes, and bias across body types and clothing are not reported.
Rule-based background filtering assumes centered subjects; robustness to clutter, partial scans, multi-person scenes, and non-centered subjects is untested.
The pipeline assumes A-pose capture; behavior for arbitrary or casual poses at capture time and its effect on alignment and animation is unexamined.
A single neutral-shape VRM mesh is used; consequences of body-shape mismatch (children, very tall/short, obese, muscular) and the need for shape estimation/morph targets are not evaluated.
No treatment of facial expressions, hand articulation, eye gaze, and phoneme-viseme sync; compatibility with blendshapes/hand rigs is unspecified.
Secondary motion (hair, loose clothing, accessories) is not modeled; nearest-vertex binding likely yields rigid or unnatural motion—no mitigation is proposed.
How view-dependent appearance (SH features) behaves under large deformations and viewpoint changes is unclear; potential for shading/color artifacts is not studied.
Re-lighting is unsupported (appearance baked into Gaussians); strategies for environment lighting consistency in AR/VR are not explored.
Robustness to imperfect scans (holes, floaters, mis-scale) from Scaniverse is not analyzed; no error detection/correction or inpainting of missing regions.
Nearest-neighbor assignment at preprocessing: algorithmic choice (k-d tree/GPU), time/memory complexity for large splat sets, and on-device feasibility are unspecified.
Calibration and metric scale alignment across devices are not detailed; implications for VR embodiment (IPD/height alignment) remain open.
Motion retargeting specifics for VRM (skeleton mapping, joint orientation conventions) and artifacts (e.g., foot sliding) are not addressed.
AR occlusion with the real world is not covered (use of WebXR depth APIs, compositing order, depth testing for translucent splats).
Temporal stability and flicker due to per-frame sorting and splat motion are not measured or mitigated (e.g., temporal smoothing).
Multi-avatar scalability and synchronization in shared experiences (network bandwidth, client performance) are not studied.
Cross-platform capture dependence on Scaniverse/iOS introduces reproducibility and accessibility limits; Android/open-source alternatives and compatibility are unclear.
Privacy and security of on-device avatar data (storage, sharing, inference risks) are not discussed; no consent or governance model for biometric scans.
Extreme motion and out-of-distribution poses (beyond capture posture) may cause artifacts; extrapolation behavior and guardrails are not examined.
Collision handling and physics integration (self-collision, environment collisions, cloth/hair simulation) are not supported.
Rendering quality controls (anti-aliasing of splats, transparency blending artifacts, depth-aware compositing) are unspecified.
Failure cases and qualitative examples (where the method breaks) are not presented; no guidelines for practitioners to avoid them.
Code, models, and datasets are not released; lack of reproducibility, parameter transparency, and benchmarking protocol.

View Paper Prompt View All Prompts

Practical Applications

Overview

This paper introduces a real-time, cross-platform pipeline for generating and animating photorealistic 3D avatars using Gaussian Splatting. The method binds splats to a background skinned mesh and updates them in parallel each frame, enabling high visual fidelity without converting to mesh-based renderers. The entire pipeline—from smartphone scanning (via Scaniverse) to on-device preprocessing—is completed in about five minutes, with avatar generation in ~30 seconds. Implemented in JavaScript/Three.js with WebXR and VRM compatibility, it runs at 40–50 fps on an iPhone 13 Pro and up to 240 fps on a laptop with an RTX 3060, making it practical for web, mobile, and VR use.

Below, we outline concrete applications across industry, academia, policy, and daily life, grouped by deployment horizon. Each bullet lists the application, relevant sectors, potential tools/workflows, and key assumptions/dependencies that affect feasibility.

Immediate Applications

These can be deployed now using the described pipeline and existing WebXR/VRM ecosystems.

Instant avatar onboarding for social and metaverse platforms; Sectors: software, media/entertainment; Tools/workflows: smartphone scan (Scaniverse) → web-based preprocessing (Three.js) → VRM-compatible motion; Assumptions/Dependencies: A-pose capture, VRM rig availability, WebXR-capable browser, device performance on mid-tier mobile.
Photoreal avatars for VR meetings and collaboration (corporate, education); Sectors: enterprise software, education; Tools/workflows: plug-in to existing WebRTC/VR meeting apps, avatar streaming/rendering via WebXR; Assumptions/Dependencies: accurate pose/motion input (e.g., camera-based mocap or controllers), privacy controls for avatar sharing.
VTuber/content creator pipeline for live streaming with photoreal avatars; Sectors: media/entertainment, software; Tools/workflows: web-based avatar generation, integration with OBS/RTMP and browser sources; Assumptions/Dependencies: consistent lighting during capture, motion drivers (face/body tracking) to avoid uncanny motion.
Rapid avatar integration for VRM-compatible games and social hubs; Sectors: gaming, XR software; Tools/workflows: import pipeline to existing VRM ecosystems, per-splat parallel update for runtime animation; Assumptions/Dependencies: VRM skeleton compatibility, LOD strategies for performance on mobile/standalone headsets.
Customer support and sales demonstrations with photoreal agents (web and AR); Sectors: retail/e-commerce, customer service; Tools/workflows: web widget that spawns the agent in AR (WebXR), guided motion scripts; Assumptions/Dependencies: network bandwidth for asset delivery, latency constraints for real-time animation.
Education: classroom telepresence and interactive labs (students appear as their avatars in virtual environments); Sectors: education; Tools/workflows: LMS plug-in for WebXR sessions, school-issued devices; Assumptions/Dependencies: simple, automated capture workflow for non-expert users, accessibility accommodations.
Mental health and social support groups in VR (privacy-preserving presence via avatars); Sectors: healthcare (behavioral health), social services; Tools/workflows: anonymous participation using personal avatars, session management in WebXR; Assumptions/Dependencies: informed consent and privacy policies, moderation tools, minimal clinical claims (focus on presence/social support).
Cultural and event experiences (personalized museum tours, conferences with photoreal attendees); Sectors: arts/culture, events; Tools/workflows: venue WebXR sites, QR-based onboarding to generate avatars on-site; Assumptions/Dependencies: reliable mobile capture in busy environments, crowd rendering performance.
HCI/graphics research and teaching demos (hands-on labs using Gaussian Splatting avatars); Sectors: academia; Tools/workflows: reproducible JS/Three.js pipeline for student projects and user studies; Assumptions/Dependencies: availability of benchmark tasks and consented datasets.
Privacy-friendly, on-device avatar creation (no cloud preprocessing); Sectors: software, cybersecurity/privacy; Tools/workflows: fully local pipeline on smartphones; Assumptions/Dependencies: iOS/Android support parity, device thermal/battery constraints.
Developer SDK/JS library for “Instant Skinned Gaussian Avatars”; Sectors: software; Tools/workflows: NPM package with WebXR components, VRM rig binding, per-frame splat resorting; Assumptions/Dependencies: documentation, stable APIs, browser GPU feature availability.

Long-Term Applications

These require further research, scaling, validation, or ecosystem development (e.g., motion fidelity, standards, policy frameworks, or multi-user performance).

Telemedicine and rehabilitation with motion-sensitive photoreal avatars; Sectors: healthcare; Tools/workflows: clinically validated motion tracking mapped to the avatar, remote assessment dashboards; Assumptions/Dependencies: medical-grade accuracy, regulatory approvals, integration with medical devices and EHR systems.
Virtual try-on and personalized fashion fitting; Sectors: retail/e-commerce; Tools/workflows: garment simulation on GS-driven avatars, body shape estimation and cloth physics; Assumptions/Dependencies: accurate anthropometrics from single-video scans, robust cloth sim at consumer scale, returns policy alignment.
Digital ID and identity verification via avatars; Sectors: finance, public sector; Tools/workflows: standards for avatar-linked identity, anti-spoofing checks, secure storage; Assumptions/Dependencies: policy/regulatory consensus, biometric and deepfake safeguards, user consent frameworks.
Large-scale multi-user metaverse sessions with thousands of photoreal avatars; Sectors: software/XR infrastructure; Tools/workflows: LOD, splat density culling, networked state sync, edge rendering; Assumptions/Dependencies: scalable networking, standardized compressed formats for GS assets, server-side orchestration.
Robot telepresence and operator digital twins; Sectors: robotics, industrial; Tools/workflows: mapping operator motion to avatar and robot, situational awareness overlays; Assumptions/Dependencies: reliable motion capture, low-latency bi-directional control, safety and human factors validation.
Enterprise training with photoreal digital twins of workforce; Sectors: manufacturing, energy, logistics; Tools/workflows: avatar-based role-play in simulated environments, performance analytics; Assumptions/Dependencies: integration with digital twin platforms, content authoring pipelines, union and privacy agreements.
Standardization of GS avatar formats and streaming (interoperability across engines and devices); Sectors: software standards, policy; Tools/workflows: GS-on-WebXR streaming codec, VRM + GS extension specs; Assumptions/Dependencies: cross-vendor collaboration, open standards bodies engagement, reference implementations.
Security, watermarking, and provenance for avatars to deter misuse; Sectors: cybersecurity, policy; Tools/workflows: cryptographic watermarking of splat data, provenance metadata pipelines; Assumptions/Dependencies: adoption by platforms, legal frameworks for enforcement, user-transparent UX.
Accessibility-first avatars (assistive inputs, adaptive motion for users with mobility constraints); Sectors: healthcare, education; Tools/workflows: multimodal inputs (voice, eye-tracking) driving avatars, adaptive motion smoothing; Assumptions/Dependencies: device compatibility, accessibility guidelines compliance, user testing.
Asset marketplaces and “Avatar-as-a-Service” platforms; Sectors: software, media; Tools/workflows: hosted pipelines for capture → processing → deployment, monetization APIs; Assumptions/Dependencies: content moderation, IP ownership management, platform fees and revenue-sharing.
Urban planning and civic engagement via XR town halls with citizen avatars; Sectors: public sector, policy; Tools/workflows: municipal WebXR portals, participatory simulations; Assumptions/Dependencies: equitable device access, privacy-preserving participation, archival policies.
Environmental impact reduction through virtual attendance (travel substitution); Sectors: energy/sustainability, enterprise; Tools/workflows: corporate policy incentives to use XR for events, tracking carbon savings; Assumptions/Dependencies: cultural adoption, reliable XR infrastructure, acceptable experience quality compared to in-person.

Notes on Assumptions and Dependencies

Across applications, feasibility depends on:

Capture conditions: A-pose compliance, subject centeredness, lighting stability, minimal occlusions.
Technical stack: WebXR support, VRM rig availability, GPU performance on mobile/standalone devices, per-frame splat sorting cost scaling with splat count.
Motion inputs: Quality of pose/face tracking (camera-based, controller-based, or sensors); absence of robust motion may reduce realism.
Interoperability: VRM ecosystem compatibility; future need for GS-specific streaming/serialization standards.
Privacy and policy: Consent, data retention, identity protection, anti-deepfake safeguards, regulatory compliance (especially in healthcare/finance).
Operational constraints: Battery/thermal limits on mobile devices, bandwidth for asset delivery, LOD and culling strategies for multi-user scenarios.

View Paper Prompt View All Prompts

Glossary

A-pose: A neutral, standardized stance for character rigging with arms angled downwards, used to simplify alignment and skinning. "we ask the subject to assume an A-pose."
Digital twin: A high-fidelity digital replica of a physical person or object used for simulation or visualization. "as well as in digital twin settings that demand photorealism."
Gaussian splats: The individual 3D Gaussian primitives that collectively represent a scene in splatting-based rendering. "Our system moves the Gaussian splats by binding them to the vertices of a background mesh."
Gaussian Splatting: A rendering/reconstruction technique that models scenes as 3D Gaussians to achieve high-fidelity, real-time results. "Gaussian splatting has emerged as a powerful technique that enables high-fidelity scene reconstruction and real-time rendering"
Metaverse: A network of persistent, shared virtual spaces and applications. "The resulting avatar can be easily integrated into metaverse applications."
Motion capture: The process of recording human movements to drive the animation of digital characters. "such as motion capture-driven avatar experiences"
Multi-View Stereo: A computer vision method that reconstructs detailed 3D geometry from multiple images with known viewpoints. "Pixelwise View Selection for Unstructured Multi-View Stereo"
Nearest-neighbor search: An algorithmic procedure to find the closest element to a query in a metric space. "through a nearest-neighbor search"
Photogrammetry: Techniques that reconstruct 3D geometry from 2D photographs or videos. "photogrammetry pipelines that automatically reconstruct 3D geometry from 2D data."
Pose estimation: Inferring a subject’s orientation and joint angles from visual data. "Using pose estimation, we infer the subjectâs front-facing direction and limb angles"
Radiance field: A representation of light emission and transport in 3D space over directions, used in neural rendering. "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
SMPL: A parametric human body model (Skinned Multi-Person Linear) with pose and shape parameters used for mesh animation. "relies on SMPL meshes that reduce expressive 3D detail."
Skinned mesh: A mesh whose vertices are bound to a skeleton and deform according to bone transformations and skinning weights. "follow the underlying skinned mesh"
Splat-wise processing: Per-splat, independent computation that enables parallel updates of Gaussian primitives. "parallel splat-wise processing"
Structure-from-Motion: Estimating camera motion and 3D structure from image sequences. "Structure-from-Motion Revisited"
Three.js: A JavaScript library for creating and rendering 3D graphics in the browser using WebGL. "using JavaScript and Three.js."
VRM: A file format/specification for humanoid avatar models designed for interoperability across applications. "we use a single VRM-format 3D avatar mesh"
WebXR: A web API standard that enables VR and AR experiences directly in browsers. "thanks to our WebXR-based approach"

View Paper Prompt View All Prompts

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

Authors (3)

Collections

Tweets

This paper has been mentioned in 5 tweets and received 301 likes.

Upgrade to Pro to view all of the tweets about this paper:

Start a free 7-day Pro trial

alphaXiv

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications (11 likes, 0 questions)

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications (2510.13978v1)

Summary

Instant Skinned Gaussian Avatars: A Technical Analysis

Introduction

Technical Contributions

Efficient Splat-Mesh Binding and Animation

Web and Mobile Implementation

Performance Metrics

Comparison with Prior Work

Implementation Details

Splat-to-Vertex Assignment

Parallel Splat Updates

Integration and Deployment

Limitations and Trade-offs

Implications and Future Directions

Conclusion

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

Key Objectives

Methods and Approach (explained simply)

Main Findings and Why They Matter

Implications and Potential Impact

Knowledge Gaps

Practical Applications

Overview

Immediate Applications

Long-Term Applications

Notes on Assumptions and Dependencies

Glossary

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections

Tweets

alphaXiv