Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 189 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 160 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

RWKV Seed Generator for Scene Completion

Updated 15 November 2025
  • RWKV-SG is a specialized module that transforms partial 3D point clouds into coarse, feature-rich outputs using a linear RWKV-based mechanism.
  • It employs a modular architecture—with PointNet encoding, PRWKV stacks, and RWKV-ATTN—to effectively fuse local and global features while enhancing computational efficiency.
  • Empirical evaluations demonstrate that RWKV-SG improves completion accuracy by about 25% in Chamfer Distance and significantly reduces model size compared to conventional baselines.

The RWKV Seed Generator (RWKV-SG) constitutes a specialized module for generating coarse, feature-rich point clouds from partial input data within the context of point cloud semantic scene completion. It is central to the architecture of RWKV-PCSSC, leveraging the Receptance Weighted Key Value (RWKV) mechanism to improve parameter and memory efficiency while delivering competitive or superior accuracy. RWKV-SG operates exclusively on geometry, eschewing auxiliary modalities such as color or normal vectors.

1. Architectural Overview

RWKV-SG transforms an input partial point cloud PinRN×3P_\text{in} \in \mathbb{R}^{N\times3} into a coarse, completed point set with associated semantic logits and features. The module is highly modular, with each sub-block processing tensors of defined shape:

Stage Input Shape/Type Output Shape/Type
PointNet Encoding PinRN×3P_\text{in}\in\mathbb{R}^{N\times3} F0RN×CfF_0\in\mathbb{R}^{N\times C_f}
PRWKV Stack (4 layers) F0RN×CfF_{0}\in\mathbb{R}^{N\times C_f} FinRN×CfF_{in}\in\mathbb{R}^{N\times C_f}
Global Feature (SA) FinRN×CfF_{in}\in\mathbb{R}^{N\times C_f} fRCgf\in\mathbb{R}^{C_g}
Query Generation [Pinf]RN×(3+Cg)[P_{in}\Vert f]\in\mathbb{R}^{N\times(3+C_g)} qinRN×Cqq_{in}\in\mathbb{R}^{N\times C_q}
RWKV-ATTN Pin,qin,kinP_{in}, q_{in}, k_{in} HmissRN×ChH_{miss}\in\mathbb{R}^{N\times C_h}
Deconvolution HmissH_{miss} FmissRN×CfF_{miss}\in\mathbb{R}^{N\times C_f}
Rebuild Head FmissF_{miss} ΔPRN×3\Delta P\in\mathbb{R}^{N\times3}
Coarse Point Sampling [Pin;Pmiss],[F0;Fmiss][P_{in};P_{miss}], [F_0;F_{miss}] PcoarseRK×3,FcoarseRK×CfP_{coarse}\in\mathbb{R}^{K\times3}, F_{coarse}\in\mathbb{R}^{K\times C_f}
Segment Head FcoarseF_{coarse} LcoarseRK×CL_{coarse}\in\mathbb{R}^{K\times C}

Following the preliminary feature extraction F0F_0 via a PointNet-style encoder, a four-layer PointRWKV (PRWKV) stack abstracts context from local and global neighborhoods, producing FinF_\text{in}. Global context ff is pooled from FinF_\text{in} and broadcast to each point to form queries qinq_\text{in} using an MLP. RWKV-ATTN fuses queries, keys, and spatial neighborhoods to estimate missing-region features HmissH_\text{miss}, which are deconvolved and reparameterized as position offsets ΔP\Delta P. Sampled coarse points PcoarseP_\text{coarse} and their features FcoarseF_\text{coarse} form the output, fed to a semantic segmentation head for coarse per-point class logits LcoarseL_\text{coarse}.

2. Core RWKV Mechanism and Equations

RWKV modules substitute the quadratic O(N2)\mathcal{O}(N^2) softmax self-attention with linear-complexity “Receptance Weighted Key-Value” (RWKV) aggregation. For input point features XRT×dX\in\mathbb{R}^{T\times d}:

  • P-Shift (per-channel local reordering):

Xr=P-ShiftR(X),Xk=P-ShiftK(X),Xv=P-ShiftV(X)X'_r = \text{P-Shift}_R(X), \quad X'_k = \text{P-Shift}_K(X), \quad X'_v = \text{P-Shift}_V(X)

(μR\mu_R, μK\mu_K, μVRd\mu_V\in\mathbb{R}^d are learnable.)

  • Linear Projections:

R=XrWR,K=XkWK,V=XvWV,W=Rd×dR = X'_r W_R,\quad K = X'_k W_K,\quad V = X'_v W_V,\qquad W_*= \mathbb{R}^{d \times d}

  • Bidirectional Linear Attention (Bi-WKV):

For output index t[1,T]t\in[1,T],

at,i={exp(u+Kt)if i=t exp(ti1Tω+Ki)if ita_{t,i} = \begin{cases} \exp(u + K_t) & \text{if}\ i=t \ \exp\left(-\frac{|t-i|-1}{T}\omega + K_i\right) & \text{if}\ i\neq t \end{cases}

wkvt=i=1Tat,iVii=1Tat,i\text{wkv}_t = \frac{\sum_{i=1}^T a_{t,i} V_i}{\sum_{i=1}^T a_{t,i}}

where u,ωRu,\,\omega\in\mathbb{R} are learnable scalars.

  • Receptance Gating:

r=σ(R),v^=rwkvr = \sigma(R),\quad \hat v = r \odot \text{wkv}

  • Output:

O=LayerNorm(v^WO),WORd×dO = \text{LayerNorm}(\hat v\,W_O),\quad W_O\in\mathbb{R}^{d\times d}

Within RWKV-ATTN, a hybrid of global PRWKV output and local kk-NN attention is used: - Local values: vij=MLP([qi;kj])v_{ij} = \text{MLP}([q_i; k_j]) - Gated: v^ij=σ(PRWKV(vij))PRWKV(vij)\hat v_{ij} = \sigma(\text{PRWKV}(v_{ij})) \odot \text{PRWKV}(v_{ij}) - Weights: Ai,j=exp(MLP(qikj+αi,j))jL(i)exp(MLP(qikj+αi,j))A_{i,j} = \frac{\exp(\mathrm{MLP}(q_i-k_j+\alpha_{i,j}))}{\sum_{j\in L(i)} \exp(\mathrm{MLP}(q_i - k_j + \alpha_{i,j}))} - Output: Hi=jL(i)Ai,j(v^iiv^ij+αi,j)+viiH_i = \sum_{j\in L(i)} A_{i,j} (\hat v_{ii} - \hat v_{ij} + \alpha_{i,j}) + v_{ii}

This structure enables global context aggregation with linear complexity and maintains spatial discrimination through local attention.

3. Feature Aggregation Workflow

The processing steps of RWKV-SG are as follows:

  1. Preliminary Feature Extraction: F0=PointNet(Pin)F_0 = \text{PointNet}(P_\text{in}).
  2. Contextual Abstraction: FinF_{in} computed via four PRWKV layers.
  3. Global Context Gathering: ff pooled using Set Abstraction; combined per-point with PinP_{in} and processed to qinq_{in}.
  4. Local and Global Feature Fusion: kink_{in} set to FinF_{in}; RWKV-ATTN computes HmissH_\text{miss} per point within each kk-NN neighborhood.
  5. Missing Feature Deconvolution: HmissH_\text{miss} upsampled to FmissF_{miss} through a Snowflake-style deconvolution.
  6. Coarse Completion: ΔP\Delta P is regressed; Pmiss=Pin+ΔPP_{miss} = P_{in} + \Delta P.
  7. Farthest Point Sampling: [Pin;Pmiss][P_{in}; P_{miss}] and [F0;Fmiss][F_0; F_{miss}] sampled to KK coarse points.
  8. Coarse Semantic Segmentation: Per-point logits LcoarseL_{coarse} computed from FcoarseF_{coarse}.

This pipeline delivers plausible coarse geometry and features filling input holes, while maintaining efficiency through linear mechanisms.

4. Learnable Parameters and Model Efficiency

RWKV-SG is parameterized for compactness and speed. For typical feature dimensionality Cf=256C_f = 256:

  • PointNet encoder: \sim41K parameters.
  • PRWKV stack (4 layers): 4(4Cf2+3Cf+2)1.054 \cdot (4 C_f^2 + 3 C_f + 2) \approx 1.05M.
  • Query-generation MLP: \sim0.26M.
  • RWKV-ATTN internals: \sim0.40M.
  • Deconvolution: \sim0.15M.
  • Rebuild head: \sim0.03M.
  • Segment head: \sim0.6M.

Total: \sim2.5M parameters, accounting for 50–60% of the full RWKV-PCSSC network. The linear-complexity RWKV structure enables the entire model (RWKV-SG + RWKV-PD) to remain \sim4M parameters—yielding a 4.18×4.18\times reduction relative to the PointSSC baseline (\sim17M).

5. Empirical Performance and Impact

RWKV-SG and its accompanying network modules offer significant improvements in parameter and memory efficiency over softmax-attention-based dense architectures:

  • Parameter reduction: Full model size \sim76.1% smaller than PointSSC.
  • Memory efficiency: Peak GPU memory reduced by \sim27% (training, batch size 8, RTX3090).
  • Ablation paper: Removal of RWKV-SG in SSC-PC increases Chamfer Distance from 0.265 to 0.353 (×1.33\times 1.33 worse) and lowers mean accuracy from 97.99% to 97.49%.
  • Qualitative output: PcoarseP_{coarse} clouds generated by RWKV-SG already reconstruct large missing areas plausibly.
  • Downstream effect: RWKV-PD refinements act primarily on edges; RWKV-SG provides the structural estimate.
  • Overall effect: RWKV-SG improves completion by \sim25% in Chamfer Distance over non-RWKV baselines while preserving or exceeding state-of-the-art SSC accuracy.

A plausible implication is that the majority of completion accuracy and efficiency gains in RWKV-PCSSC can be attributed directly to the design of RWKV-SG.

6. Context and Significance within Point Cloud Completion

RWKV-SG exemplifies a new paradigm in 3D point cloud completion, replacing resource-intensive attention with a linear, context-aware mechanism. By forgoing auxiliary cues (color, normals) and reducing overparameterization, it delivers competitive semantic scene completion on both standard datasets (SSC-PC, NYUCAD-PC, PointSSC) and new benchmarks (NYUCAD-PC-V2, 3D-FRONT-PC), as developed in "RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion" (He et al., 13 Nov 2025). This suggests further investigation of RWKV-style mechanisms is warranted for scalable 3D scene understanding in memory- and compute-constrained environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to RWKV Seed Generator (RWKV-SG).