RWKV Seed Generator for Scene Completion
- RWKV-SG is a specialized module that transforms partial 3D point clouds into coarse, feature-rich outputs using a linear RWKV-based mechanism.
- It employs a modular architecture—with PointNet encoding, PRWKV stacks, and RWKV-ATTN—to effectively fuse local and global features while enhancing computational efficiency.
- Empirical evaluations demonstrate that RWKV-SG improves completion accuracy by about 25% in Chamfer Distance and significantly reduces model size compared to conventional baselines.
The RWKV Seed Generator (RWKV-SG) constitutes a specialized module for generating coarse, feature-rich point clouds from partial input data within the context of point cloud semantic scene completion. It is central to the architecture of RWKV-PCSSC, leveraging the Receptance Weighted Key Value (RWKV) mechanism to improve parameter and memory efficiency while delivering competitive or superior accuracy. RWKV-SG operates exclusively on geometry, eschewing auxiliary modalities such as color or normal vectors.
1. Architectural Overview
RWKV-SG transforms an input partial point cloud into a coarse, completed point set with associated semantic logits and features. The module is highly modular, with each sub-block processing tensors of defined shape:
| Stage | Input Shape/Type | Output Shape/Type |
|---|---|---|
| PointNet Encoding | ||
| PRWKV Stack (4 layers) | ||
| Global Feature (SA) | ||
| Query Generation | ||
| RWKV-ATTN | ||
| Deconvolution | ||
| Rebuild Head | ||
| Coarse Point Sampling | ||
| Segment Head |
Following the preliminary feature extraction via a PointNet-style encoder, a four-layer PointRWKV (PRWKV) stack abstracts context from local and global neighborhoods, producing . Global context is pooled from and broadcast to each point to form queries using an MLP. RWKV-ATTN fuses queries, keys, and spatial neighborhoods to estimate missing-region features , which are deconvolved and reparameterized as position offsets . Sampled coarse points and their features form the output, fed to a semantic segmentation head for coarse per-point class logits .
2. Core RWKV Mechanism and Equations
RWKV modules substitute the quadratic softmax self-attention with linear-complexity “Receptance Weighted Key-Value” (RWKV) aggregation. For input point features :
- P-Shift (per-channel local reordering):
(, , are learnable.)
- Linear Projections:
- Bidirectional Linear Attention (Bi-WKV):
For output index ,
where are learnable scalars.
- Receptance Gating:
- Output:
Within RWKV-ATTN, a hybrid of global PRWKV output and local -NN attention is used: - Local values: - Gated: - Weights: - Output:
This structure enables global context aggregation with linear complexity and maintains spatial discrimination through local attention.
3. Feature Aggregation Workflow
The processing steps of RWKV-SG are as follows:
- Preliminary Feature Extraction: .
- Contextual Abstraction: computed via four PRWKV layers.
- Global Context Gathering: pooled using Set Abstraction; combined per-point with and processed to .
- Local and Global Feature Fusion: set to ; RWKV-ATTN computes per point within each -NN neighborhood.
- Missing Feature Deconvolution: upsampled to through a Snowflake-style deconvolution.
- Coarse Completion: is regressed; .
- Farthest Point Sampling: and sampled to coarse points.
- Coarse Semantic Segmentation: Per-point logits computed from .
This pipeline delivers plausible coarse geometry and features filling input holes, while maintaining efficiency through linear mechanisms.
4. Learnable Parameters and Model Efficiency
RWKV-SG is parameterized for compactness and speed. For typical feature dimensionality :
- PointNet encoder: 41K parameters.
- PRWKV stack (4 layers): M.
- Query-generation MLP: 0.26M.
- RWKV-ATTN internals: 0.40M.
- Deconvolution: 0.15M.
- Rebuild head: 0.03M.
- Segment head: 0.6M.
Total: 2.5M parameters, accounting for 50–60% of the full RWKV-PCSSC network. The linear-complexity RWKV structure enables the entire model (RWKV-SG + RWKV-PD) to remain 4M parameters—yielding a reduction relative to the PointSSC baseline (17M).
5. Empirical Performance and Impact
RWKV-SG and its accompanying network modules offer significant improvements in parameter and memory efficiency over softmax-attention-based dense architectures:
- Parameter reduction: Full model size 76.1% smaller than PointSSC.
- Memory efficiency: Peak GPU memory reduced by 27% (training, batch size 8, RTX3090).
- Ablation paper: Removal of RWKV-SG in SSC-PC increases Chamfer Distance from 0.265 to 0.353 ( worse) and lowers mean accuracy from 97.99% to 97.49%.
- Qualitative output: clouds generated by RWKV-SG already reconstruct large missing areas plausibly.
- Downstream effect: RWKV-PD refinements act primarily on edges; RWKV-SG provides the structural estimate.
- Overall effect: RWKV-SG improves completion by 25% in Chamfer Distance over non-RWKV baselines while preserving or exceeding state-of-the-art SSC accuracy.
A plausible implication is that the majority of completion accuracy and efficiency gains in RWKV-PCSSC can be attributed directly to the design of RWKV-SG.
6. Context and Significance within Point Cloud Completion
RWKV-SG exemplifies a new paradigm in 3D point cloud completion, replacing resource-intensive attention with a linear, context-aware mechanism. By forgoing auxiliary cues (color, normals) and reducing overparameterization, it delivers competitive semantic scene completion on both standard datasets (SSC-PC, NYUCAD-PC, PointSSC) and new benchmarks (NYUCAD-PC-V2, 3D-FRONT-PC), as developed in "RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion" (He et al., 13 Nov 2025). This suggests further investigation of RWKV-style mechanisms is warranted for scalable 3D scene understanding in memory- and compute-constrained environments.