RWKV Seed Generator for Scene Completion

Updated 15 November 2025

RWKV-SG is a specialized module that transforms partial 3D point clouds into coarse, feature-rich outputs using a linear RWKV-based mechanism.
It employs a modular architecture—with PointNet encoding, PRWKV stacks, and RWKV-ATTN—to effectively fuse local and global features while enhancing computational efficiency.
Empirical evaluations demonstrate that RWKV-SG improves completion accuracy by about 25% in Chamfer Distance and significantly reduces model size compared to conventional baselines.

The RWKV Seed Generator (RWKV-SG) constitutes a specialized module for generating coarse, feature-rich point clouds from partial input data within the context of point cloud semantic scene completion. It is central to the architecture of RWKV-PCSSC, leveraging the Receptance Weighted Key Value (RWKV) mechanism to improve parameter and memory efficiency while delivering competitive or superior accuracy. RWKV-SG operates exclusively on geometry, eschewing auxiliary modalities such as color or normal vectors.

1. Architectural Overview

RWKV-SG transforms an input partial point cloud $P_\text{in} \in \mathbb{R}^{N\times3}$ into a coarse, completed point set with associated semantic logits and features. The module is highly modular, with each sub-block processing tensors of defined shape:

Stage	Input Shape/Type	Output Shape/Type
PointNet Encoding	$P_\text{in}\in\mathbb{R}^{N\times3}$	$F_0\in\mathbb{R}^{N\times C_f}$
PRWKV Stack (4 layers)	$F_{0}\in\mathbb{R}^{N\times C_f}$	$F_{in}\in\mathbb{R}^{N\times C_f}$
Global Feature (SA)	$F_{in}\in\mathbb{R}^{N\times C_f}$	$f\in\mathbb{R}^{C_g}$
Query Generation	$[P_{in}\Vert f]\in\mathbb{R}^{N\times(3+C_g)}$	$q_{in}\in\mathbb{R}^{N\times C_q}$
RWKV-ATTN	$P_{in}, q_{in}, k_{in}$	$H_{miss}\in\mathbb{R}^{N\times C_h}$
Deconvolution	$H_{miss}$	$F_{miss}\in\mathbb{R}^{N\times C_f}$
Rebuild Head	$F_{miss}$	$\Delta P\in\mathbb{R}^{N\times3}$
Coarse Point Sampling	$[P_{in};P_{miss}], [F_0;F_{miss}]$	$P_{coarse}\in\mathbb{R}^{K\times3}, F_{coarse}\in\mathbb{R}^{K\times C_f}$
Segment Head	$F_{coarse}$	$L_{coarse}\in\mathbb{R}^{K\times C}$

Following the preliminary feature extraction $F_0$ via a PointNet-style encoder, a four-layer PointRWKV (PRWKV) stack abstracts context from local and global neighborhoods, producing $F_\text{in}$ . Global context $f$ is pooled from $F_\text{in}$ and broadcast to each point to form queries $q_\text{in}$ using an MLP. RWKV-ATTN fuses queries, keys, and spatial neighborhoods to estimate missing-region features $H_\text{miss}$ , which are deconvolved and reparameterized as position offsets $\Delta P$ . Sampled coarse points $P_\text{coarse}$ and their features $F_\text{coarse}$ form the output, fed to a semantic segmentation head for coarse per-point class logits $L_\text{coarse}$ .

2. Core RWKV Mechanism and Equations

RWKV modules substitute the quadratic $\mathcal{O}(N^2)$ softmax self-attention with linear-complexity “Receptance Weighted Key-Value” (RWKV) aggregation. For input point features $X\in\mathbb{R}^{T\times d}$ :

P-Shift (per-channel local reordering):

$X'_r = \text{P-Shift}_R(X), \quad X'_k = \text{P-Shift}_K(X), \quad X'_v = \text{P-Shift}_V(X)$

( $\mu_R$ , $\mu_K$ , $\mu_V\in\mathbb{R}^d$ are learnable.)

Linear Projections:

$R = X'_r W_R,\quad K = X'_k W_K,\quad V = X'_v W_V,\qquad W_*= \mathbb{R}^{d \times d}$

Bidirectional Linear Attention (Bi-WKV):

For output index $t\in[1,T]$ ,

$a_{t,i} = \begin{cases} \exp(u + K_t) & \text{if}\ i=t \ \exp\left(-\frac{|t-i|-1}{T}\omega + K_i\right) & \text{if}\ i\neq t \end{cases}$

$\text{wkv}_t = \frac{\sum_{i=1}^T a_{t,i} V_i}{\sum_{i=1}^T a_{t,i}}$

where $u,\,\omega\in\mathbb{R}$ are learnable scalars.

Receptance Gating:

$r = \sigma(R),\quad \hat v = r \odot \text{wkv}$

Output:

$O = \text{LayerNorm}(\hat v\,W_O),\quad W_O\in\mathbb{R}^{d\times d}$

Within RWKV-ATTN, a hybrid of global PRWKV output and local $k$ -NN attention is used: - Local values: $v_{ij} = \text{MLP}([q_i; k_j])$ - Gated: $\hat v_{ij} = \sigma(\text{PRWKV}(v_{ij})) \odot \text{PRWKV}(v_{ij})$ - Weights: $A_{i,j} = \frac{\exp(\mathrm{MLP}(q_i-k_j+\alpha_{i,j}))}{\sum_{j\in L(i)} \exp(\mathrm{MLP}(q_i - k_j + \alpha_{i,j}))}$ - Output: $H_i = \sum_{j\in L(i)} A_{i,j} (\hat v_{ii} - \hat v_{ij} + \alpha_{i,j}) + v_{ii}$

This structure enables global context aggregation with linear complexity and maintains spatial discrimination through local attention.

3. Feature Aggregation Workflow

The processing steps of RWKV-SG are as follows:

Preliminary Feature Extraction: $F_0 = \text{PointNet}(P_\text{in})$ .
Contextual Abstraction: $F_{in}$ computed via four PRWKV layers.
Global Context Gathering: $f$ pooled using Set Abstraction; combined per-point with $P_{in}$ and processed to $q_{in}$ .
Local and Global Feature Fusion: $k_{in}$ set to $F_{in}$ ; RWKV-ATTN computes $H_\text{miss}$ per point within each $k$ -NN neighborhood.
Missing Feature Deconvolution: $H_\text{miss}$ upsampled to $F_{miss}$ through a Snowflake-style deconvolution.
Coarse Completion: $\Delta P$ is regressed; $P_{miss} = P_{in} + \Delta P$ .
Farthest Point Sampling: $[P_{in}; P_{miss}]$ and $[F_0; F_{miss}]$ sampled to $K$ coarse points.
Coarse Semantic Segmentation: Per-point logits $L_{coarse}$ computed from $F_{coarse}$ .

This pipeline delivers plausible coarse geometry and features filling input holes, while maintaining efficiency through linear mechanisms.

4. Learnable Parameters and Model Efficiency

RWKV-SG is parameterized for compactness and speed. For typical feature dimensionality $C_f = 256$ :

PointNet encoder: $\sim$ 41K parameters.
PRWKV stack (4 layers): $4 \cdot (4 C_f^2 + 3 C_f + 2) \approx 1.05$ M.
Query-generation MLP: $\sim$ 0.26M.
RWKV-ATTN internals: $\sim$ 0.40M.
Deconvolution: $\sim$ 0.15M.
Rebuild head: $\sim$ 0.03M.
Segment head: $\sim$ 0.6M.

Total: $\sim$ 2.5M parameters, accounting for 50–60% of the full RWKV-PCSSC network. The linear-complexity RWKV structure enables the entire model (RWKV-SG + RWKV-PD) to remain $\sim$ 4M parameters—yielding a $4.18\times$ reduction relative to the PointSSC baseline ( $\sim$ 17M).

5. Empirical Performance and Impact

RWKV-SG and its accompanying network modules offer significant improvements in parameter and memory efficiency over softmax-attention-based dense architectures:

Parameter reduction: Full model size $\sim$ 76.1% smaller than PointSSC.
Memory efficiency: Peak GPU memory reduced by $\sim$ 27% (training, batch size 8, RTX3090).
Ablation paper: Removal of RWKV-SG in SSC-PC increases Chamfer Distance from 0.265 to 0.353 ( $\times 1.33$ worse) and lowers mean accuracy from 97.99% to 97.49%.
Qualitative output: $P_{coarse}$ clouds generated by RWKV-SG already reconstruct large missing areas plausibly.
Downstream effect: RWKV-PD refinements act primarily on edges; RWKV-SG provides the structural estimate.
Overall effect: RWKV-SG improves completion by $\sim$ 25% in Chamfer Distance over non-RWKV baselines while preserving or exceeding state-of-the-art SSC accuracy.

A plausible implication is that the majority of completion accuracy and efficiency gains in RWKV-PCSSC can be attributed directly to the design of RWKV-SG.

6. Context and Significance within Point Cloud Completion

RWKV-SG exemplifies a new paradigm in 3D point cloud completion, replacing resource-intensive attention with a linear, context-aware mechanism. By forgoing auxiliary cues (color, normals) and reducing overparameterization, it delivers competitive semantic scene completion on both standard datasets (SSC-PC, NYUCAD-PC, PointSSC) and new benchmarks (NYUCAD-PC-V2, 3D-FRONT-PC), as developed in "RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion" (He et al., 13 Nov 2025). This suggests further investigation of RWKV-style mechanisms is warranted for scalable 3D scene understanding in memory- and compute-constrained environments.

PDF Markdown Chat (Pro)

References (1)

RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion (2025)

Follow Topic

Get notified by email when new papers are published related to RWKV Seed Generator (RWKV-SG).