RWKV Point Deconvolution in 3D Scene Completion

Updated 15 November 2025

RWKV Point Deconvolution is a module that upscales and refines point clouds by integrating efficient global context with local geometric attention.
It employs a multi-stage coarse-to-fine deconvolution approach, achieving linear-time operations compared to traditional quadratic self-attention methods.
Empirical evaluations demonstrate that RWKV-PD reduces parameter count and GPU memory usage while maintaining high reconstruction fidelity and semantic accuracy.

RWKV Point Deconvolution (RWKV-PD) is a point cloud upsampling and feature refinement module introduced in the context of lightweight semantic scene completion networks, specifically within RWKV-PCSSC. RWKV-PD integrates the Receptance Weighted Key Value (RWKV) mechanism—originally devised to enable efficient global context modeling—with local geometric attention and hierarchical deconvolution, yielding a multi-stage architecture for coarse-to-fine restoration of point-wise features, semantics, and geometry. Its design enables efficient handling of large point sets by circumventing the quadratic complexity bottleneck of conventional self-attention-based approaches.

1. Role within RWKV-PCSSC and Design Rationale

In the RWKV-PCSSC architecture, after the RWKV Seed Generator (RWKV-SG) produces a coarse, partial representation of the scene, a sequence of RWKV-PD modules incrementally upsamples and refines the point cloud. At refinement stage $i$ , RWKV-PD receives a point set $P_{i-1}$ with per-point features $F_{i-1}$ , and outputs:

an upsampled point set $P_i$
refined per-point features $F_i$
updated semantic logits $L_i$ .

The RWKV-PD design is motivated by:

the need for linear-time global modeling of point sets (addressed via the PRWKV mechanism),
effective fusion of global contextual and local geometric information (via a hybrid RWKV-ATTN formulation),
and parameter-efficient, learned expansion of point clouds (through SnowflakeNet-inspired deconvolution, modulated by context-enhanced features).

By replacing the $O(n^2)$ operations of full self-attention with linear-time methods and neighborhood-local attention, RWKV-PD achieves substantial gains in memory and compute efficiency without sacrificing reconstruction fidelity or semantic accuracy.

2. Mathematical Formulation

The RWKV-PD stage at hierarchy level $i$ is defined by the following inputs and outputs:

Input points $P_{i-1}\in\mathbb{R}^{n\times 3}$
Input per-point features $F_{i-1}\in\mathbb{R}^{n\times c}$
Output points $P_i\in\mathbb{R}^{(n\lambda)\times 3}$
Output features $F_i\in\mathbb{R}^{(n\lambda)\times c}$
Output semantic logits $L_i\in\mathbb{R}^{(n\lambda)\times C}$ (for $C$ classes)
Upsampling factor per stage $\lambda$ .

The module consists of the following sub-steps:

Global Feature Extraction:

$f = \mathrm{FeatureExtractor}(F_{i-1})\in\mathbb{R}^{d}$

with FeatureExtractor = PRWKV + Set Abstraction (PointNet++).

Query, Key, Value Construction:

$q_i = \mathrm{MLP}_q(\mathrm{concat}(P_{i-1},f))\in\mathbb{R}^{n\times d_q}$

$k_i = F_{i-1}$

$v = \mathrm{MLP}_v(\mathrm{concat}(q_i, k_i))\in\mathbb{R}^{n\times c}$

Receptance-Weighted Value Computation:

$\widehat v = \sigma(\mathrm{PRWKV}(v)) \odot \mathrm{PRWKV}(v)$

where $\sigma$ is the element-wise sigmoid.

Local Attention Scoring (for $k$ -NN neighbors $L(i)$ ):

$a_{ij} = \exp(\mathrm{MLP}(q_i - k_j + \alpha)),\quad j \in L(i)$

$\mathcal{A}_{ij} = \frac{a_{ij}}{\sum_{j'\in L(i)}a_{ij'}}$

Aggregation:

$H_i = \sum_{j \in L(i)} \mathcal{A}_{ij} (\widehat v_i - \widehat v_j + \alpha) + v$

Deconvolution / Upsampling:

$F_i = \mathrm{Deconv}(H_i)\in\mathbb{R}^{(n\lambda)\times c}$

Predict per-child positional offsets $\Delta P$ and semantic logits $L_i$ via MLPs.

Update Point Positions:

$P_i = \mathrm{repeat}(P_{i-1}, \lambda) + \Delta P$

This structure achieves both coarse-to-fine geometry refinement and context-aware semantic labeling at each stage.

3. Algorithmic Workflow and Implementation Details

A typical forward pass through an RWKV-PD module can be expressed in PyTorch-style pseudocode:

def RWKV_PD(P_prev, F_prev, lambda_factor):
    # P_prev: (n,3), F_prev: (n,c)
    f = FeatureExtractor(F_prev)              # (d,)
    Q_in = torch.cat([P_prev, f.expand(n, -1)], dim=1)  # (n, 3+d)
    q = MLP_q(Q_in)                           # (n, d_q)
    k = F_prev                                # (n, c)
    v0 = MLP_v(torch.cat([q, k], dim=1))      # (n, c)
    r1 = PRWKV(v0)
    r2 = PRWKV(v0)
    v_hat = torch.sigmoid(r1) * r2            # (n, c)
    idx = knn_indices(P_prev, k=k_neigh)      # (n, k_neigh)
    q_i = q.unsqueeze(1).expand(n, k_neigh, d_q)
    k_j = k[idx]                              # (n, k_neigh, c)
    alpha = positional_bias
    scores = MLP_att(q_i - k_j + alpha)       # (n, k_neigh, 1)
    A = softmax(scores, dim=1)                # (n, k_neigh, 1)
    vhat_i = v_hat.unsqueeze(1).expand(n, k_neigh, c)
    vhat_j = v_hat[idx]                       # (n, k_neigh, c)
    local_msg = (vhat_i - vhat_j + alpha)     # (n, k_neigh, c)
    weighted = (A * local_msg).sum(dim=1)     # (n, c)
    H = weighted + v0                         # (n, c)
    F_child = DeconvModule(H, lambda_factor)  # (n*lambda, c)
    DeltaP = RebuildHead(F_child)             # (n*lambda, 3)
    L_logits = SegmentHead(F_child)           # (n*lambda, C)
    P_rep = P_prev.repeat_interleave(lambda_factor, dim=0)
    P_child = P_rep + DeltaP                  # (n*lambda, 3)
    return P_child, F_child, L_logits

This workflow encapsulates global feature extraction, query-key-value construction, PRWKV-gated value computation, local attention, aggregation, deconvolution, and coordinate/logit prediction.

4. Computational and Parameter Complexity

Relative to standard point cloud deconvolution or point transformer blocks, RWKV-PD achieves greater efficiency in both runtime and memory:

Traditional Self-Attention: $O(n^2 c)$ runtime and $O(n^2)$ memory for $n$ points.
RWKV-PD Global Context (PRWKV): $O(n c^2)$ for projections, $O(n c)$ for bidirectional sweep.
RWKV-PD Local Attention: $O(n k c)$ for $k \ll n$ nearest neighbors.
Total Complexity per RWKV-PD Stage: $O(n (c^2 + k c))$ — strictly linear in $n$ .

Empirically, for $n=4096$ and $k=16$ , RWKV-PD is 5–10× faster and uses 3–5× less GPU memory than a similarly wide full self-attention point transformer block.

Parameter count per stage (for $c=128, d_q=64, \lambda=2$ ):

Module	Parameter Count (approx.)
MLP_q	$5 \mathrm{k}$
MLP_v	$20 \mathrm{k}$
RWKV Projections	$64 \mathrm{k}$
Deconv Kernels	$30 \mathrm{k}$
Rebuild & Segment Heads	$30 \mathrm{k}$
Total	$170\,\mathrm{k}$

A Snowflake Point Deconvolution block of similar width is ~200k parameters, but incurs the quadratic scratchpad associated with full self-attention.

5. Ablation Studies and Empirical Impact

Ablation studies on the SSC-PC dataset illustrate the critical contributions of RWKV-PD and its internal components:

RWKV-PD vs. Standard Deconv: Replacing RWKV-PD with a vanilla Snowflake Deconv + identical heads increases Chamfer Distance (CD) from 0.265 to 0.274 (a 1.03× degradation) and reduces mean IoU (mIoU) from 95.27 to 94.61.
Depth of RWKV-PD: Collapsing the stack to a single PRWKV-ATTN layer further degrades CD (to 0.287) and mIoU (to 94.48).

Across full benchmarks, RWKV-PCSSC with three RWKV-PD stages achieves a parameter count reduction of 4.18× and a memory improvement of 1.37× compared to state-of-the-art PointSSC, while maintaining or exceeding accuracy on SSC-PC, NYUCAD-PC, PointSSC, NYUCAD-PC-V2, and 3D-FRONT-PC.

6. Advantages and Practical Considerations

RWKV-PD offers several advantages for semantic scene completion and related 3D point cloud tasks:

Scalability: Linear-time operations enable handling of substantially larger point sets on commodity hardware.
Memory Efficiency: Reduced scratchpad requirements (owing to locality and PRWKV) minimize GPU memory footprint.
Global-local Context Integration: The hybrid of global (PRWKV) and local (RWKV-ATTN) aggregation permits fine-grained scene understanding.
Parameter Efficiency: Stages are compact, enabling deep cascades without prohibitive resource usage.
Implementation: RWKV-PD can be composed as a stack of modular blocks, each following the outlined pseudocode and mathematical workflow.

A plausible implication is that the modularity and scalability of RWKV-PD make it adaptable to other structured prediction tasks in large-scale 3D point cloud data processing.

7. Context within the Literature and Perspectives

RWKV-PD combines concepts from global context fusion (as in transformer and RWKV architectures), local geometric attention, and learned hierarchical deconvolution (cf. SnowflakeNet) in a manner tailored for point cloud semantic scene completion. The integration of receptance-weighted key/value networks eliminates the computational inefficiencies of standard transformer modules in 3D contexts, addressing widely recognized bottlenecks in parameter count and runtime associated with dense attention. Recent empirical results demonstrate that these techniques are sufficient to both outperform and scale beyond prior state-of-the-art baselines such as PointSSC, affirming the centrality of RWKV-PD to modern, resource-efficient 3D scene completion architectures (He et al., 13 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion (2025)

Follow Topic

Get notified by email when new papers are published related to RWKV Point Deconvolution (RWKV-PD).