Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 189 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 60 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

RWKV Point Deconvolution in 3D Scene Completion

Updated 15 November 2025
  • RWKV Point Deconvolution is a module that upscales and refines point clouds by integrating efficient global context with local geometric attention.
  • It employs a multi-stage coarse-to-fine deconvolution approach, achieving linear-time operations compared to traditional quadratic self-attention methods.
  • Empirical evaluations demonstrate that RWKV-PD reduces parameter count and GPU memory usage while maintaining high reconstruction fidelity and semantic accuracy.

RWKV Point Deconvolution (RWKV-PD) is a point cloud upsampling and feature refinement module introduced in the context of lightweight semantic scene completion networks, specifically within RWKV-PCSSC. RWKV-PD integrates the Receptance Weighted Key Value (RWKV) mechanism—originally devised to enable efficient global context modeling—with local geometric attention and hierarchical deconvolution, yielding a multi-stage architecture for coarse-to-fine restoration of point-wise features, semantics, and geometry. Its design enables efficient handling of large point sets by circumventing the quadratic complexity bottleneck of conventional self-attention-based approaches.

1. Role within RWKV-PCSSC and Design Rationale

In the RWKV-PCSSC architecture, after the RWKV Seed Generator (RWKV-SG) produces a coarse, partial representation of the scene, a sequence of RWKV-PD modules incrementally upsamples and refines the point cloud. At refinement stage ii, RWKV-PD receives a point set Pi1P_{i-1} with per-point features Fi1F_{i-1}, and outputs:

  • an upsampled point set PiP_i
  • refined per-point features FiF_i
  • updated semantic logits LiL_i.

The RWKV-PD design is motivated by:

  • the need for linear-time global modeling of point sets (addressed via the PRWKV mechanism),
  • effective fusion of global contextual and local geometric information (via a hybrid RWKV-ATTN formulation),
  • and parameter-efficient, learned expansion of point clouds (through SnowflakeNet-inspired deconvolution, modulated by context-enhanced features).

By replacing the O(n2)O(n^2) operations of full self-attention with linear-time methods and neighborhood-local attention, RWKV-PD achieves substantial gains in memory and compute efficiency without sacrificing reconstruction fidelity or semantic accuracy.

2. Mathematical Formulation

The RWKV-PD stage at hierarchy level ii is defined by the following inputs and outputs:

  • Input points Pi1Rn×3P_{i-1}\in\mathbb{R}^{n\times 3}
  • Input per-point features Fi1Rn×cF_{i-1}\in\mathbb{R}^{n\times c}
  • Output points PiR(nλ)×3P_i\in\mathbb{R}^{(n\lambda)\times 3}
  • Output features FiR(nλ)×cF_i\in\mathbb{R}^{(n\lambda)\times c}
  • Output semantic logits LiR(nλ)×CL_i\in\mathbb{R}^{(n\lambda)\times C} (for CC classes)
  • Upsampling factor per stage λ\lambda.

The module consists of the following sub-steps:

  1. Global Feature Extraction:

f=FeatureExtractor(Fi1)Rdf = \mathrm{FeatureExtractor}(F_{i-1})\in\mathbb{R}^{d}

with FeatureExtractor = PRWKV + Set Abstraction (PointNet++).

  1. Query, Key, Value Construction:

qi=MLPq(concat(Pi1,f))Rn×dqq_i = \mathrm{MLP}_q(\mathrm{concat}(P_{i-1},f))\in\mathbb{R}^{n\times d_q}

ki=Fi1k_i = F_{i-1}

v=MLPv(concat(qi,ki))Rn×cv = \mathrm{MLP}_v(\mathrm{concat}(q_i, k_i))\in\mathbb{R}^{n\times c}

  1. Receptance-Weighted Value Computation:

v^=σ(PRWKV(v))PRWKV(v)\widehat v = \sigma(\mathrm{PRWKV}(v)) \odot \mathrm{PRWKV}(v)

where σ\sigma is the element-wise sigmoid.

  1. Local Attention Scoring (for kk-NN neighbors L(i)L(i)):

aij=exp(MLP(qikj+α)),jL(i)a_{ij} = \exp(\mathrm{MLP}(q_i - k_j + \alpha)),\quad j \in L(i)

Aij=aijjL(i)aij\mathcal{A}_{ij} = \frac{a_{ij}}{\sum_{j'\in L(i)}a_{ij'}}

  1. Aggregation:

Hi=jL(i)Aij(v^iv^j+α)+vH_i = \sum_{j \in L(i)} \mathcal{A}_{ij} (\widehat v_i - \widehat v_j + \alpha) + v

  1. Deconvolution / Upsampling:

Fi=Deconv(Hi)R(nλ)×cF_i = \mathrm{Deconv}(H_i)\in\mathbb{R}^{(n\lambda)\times c}

Predict per-child positional offsets ΔP\Delta P and semantic logits LiL_i via MLPs.

  1. Update Point Positions:

Pi=repeat(Pi1,λ)+ΔPP_i = \mathrm{repeat}(P_{i-1}, \lambda) + \Delta P

This structure achieves both coarse-to-fine geometry refinement and context-aware semantic labeling at each stage.

3. Algorithmic Workflow and Implementation Details

A typical forward pass through an RWKV-PD module can be expressed in PyTorch-style pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def RWKV_PD(P_prev, F_prev, lambda_factor):
    # P_prev: (n,3), F_prev: (n,c)
    f = FeatureExtractor(F_prev)              # (d,)
    Q_in = torch.cat([P_prev, f.expand(n, -1)], dim=1)  # (n, 3+d)
    q = MLP_q(Q_in)                           # (n, d_q)
    k = F_prev                                # (n, c)
    v0 = MLP_v(torch.cat([q, k], dim=1))      # (n, c)
    r1 = PRWKV(v0)
    r2 = PRWKV(v0)
    v_hat = torch.sigmoid(r1) * r2            # (n, c)
    idx = knn_indices(P_prev, k=k_neigh)      # (n, k_neigh)
    q_i = q.unsqueeze(1).expand(n, k_neigh, d_q)
    k_j = k[idx]                              # (n, k_neigh, c)
    alpha = positional_bias
    scores = MLP_att(q_i - k_j + alpha)       # (n, k_neigh, 1)
    A = softmax(scores, dim=1)                # (n, k_neigh, 1)
    vhat_i = v_hat.unsqueeze(1).expand(n, k_neigh, c)
    vhat_j = v_hat[idx]                       # (n, k_neigh, c)
    local_msg = (vhat_i - vhat_j + alpha)     # (n, k_neigh, c)
    weighted = (A * local_msg).sum(dim=1)     # (n, c)
    H = weighted + v0                         # (n, c)
    F_child = DeconvModule(H, lambda_factor)  # (n*lambda, c)
    DeltaP = RebuildHead(F_child)             # (n*lambda, 3)
    L_logits = SegmentHead(F_child)           # (n*lambda, C)
    P_rep = P_prev.repeat_interleave(lambda_factor, dim=0)
    P_child = P_rep + DeltaP                  # (n*lambda, 3)
    return P_child, F_child, L_logits

This workflow encapsulates global feature extraction, query-key-value construction, PRWKV-gated value computation, local attention, aggregation, deconvolution, and coordinate/logit prediction.

4. Computational and Parameter Complexity

Relative to standard point cloud deconvolution or point transformer blocks, RWKV-PD achieves greater efficiency in both runtime and memory:

  • Traditional Self-Attention: O(n2c)O(n^2 c) runtime and O(n2)O(n^2) memory for nn points.
  • RWKV-PD Global Context (PRWKV): O(nc2)O(n c^2) for projections, O(nc)O(n c) for bidirectional sweep.
  • RWKV-PD Local Attention: O(nkc)O(n k c) for knk \ll n nearest neighbors.
  • Total Complexity per RWKV-PD Stage: O(n(c2+kc))O(n (c^2 + k c)) — strictly linear in nn.

Empirically, for n=4096n=4096 and k=16k=16, RWKV-PD is 5–10× faster and uses 3–5× less GPU memory than a similarly wide full self-attention point transformer block.

Parameter count per stage (for c=128,dq=64,λ=2c=128, d_q=64, \lambda=2):

Module Parameter Count (approx.)
MLP_q 5k5 \mathrm{k}
MLP_v 20k20 \mathrm{k}
RWKV Projections 64k64 \mathrm{k}
Deconv Kernels 30k30 \mathrm{k}
Rebuild & Segment Heads 30k30 \mathrm{k}
Total 170k170\,\mathrm{k}

A Snowflake Point Deconvolution block of similar width is ~200k parameters, but incurs the quadratic scratchpad associated with full self-attention.

5. Ablation Studies and Empirical Impact

Ablation studies on the SSC-PC dataset illustrate the critical contributions of RWKV-PD and its internal components:

  • RWKV-PD vs. Standard Deconv: Replacing RWKV-PD with a vanilla Snowflake Deconv + identical heads increases Chamfer Distance (CD) from 0.265 to 0.274 (a 1.03× degradation) and reduces mean IoU (mIoU) from 95.27 to 94.61.
  • Depth of RWKV-PD: Collapsing the stack to a single PRWKV-ATTN layer further degrades CD (to 0.287) and mIoU (to 94.48).

Across full benchmarks, RWKV-PCSSC with three RWKV-PD stages achieves a parameter count reduction of 4.18× and a memory improvement of 1.37× compared to state-of-the-art PointSSC, while maintaining or exceeding accuracy on SSC-PC, NYUCAD-PC, PointSSC, NYUCAD-PC-V2, and 3D-FRONT-PC.

6. Advantages and Practical Considerations

RWKV-PD offers several advantages for semantic scene completion and related 3D point cloud tasks:

  • Scalability: Linear-time operations enable handling of substantially larger point sets on commodity hardware.
  • Memory Efficiency: Reduced scratchpad requirements (owing to locality and PRWKV) minimize GPU memory footprint.
  • Global-local Context Integration: The hybrid of global (PRWKV) and local (RWKV-ATTN) aggregation permits fine-grained scene understanding.
  • Parameter Efficiency: Stages are compact, enabling deep cascades without prohibitive resource usage.
  • Implementation: RWKV-PD can be composed as a stack of modular blocks, each following the outlined pseudocode and mathematical workflow.

A plausible implication is that the modularity and scalability of RWKV-PD make it adaptable to other structured prediction tasks in large-scale 3D point cloud data processing.

7. Context within the Literature and Perspectives

RWKV-PD combines concepts from global context fusion (as in transformer and RWKV architectures), local geometric attention, and learned hierarchical deconvolution (cf. SnowflakeNet) in a manner tailored for point cloud semantic scene completion. The integration of receptance-weighted key/value networks eliminates the computational inefficiencies of standard transformer modules in 3D contexts, addressing widely recognized bottlenecks in parameter count and runtime associated with dense attention. Recent empirical results demonstrate that these techniques are sufficient to both outperform and scale beyond prior state-of-the-art baselines such as PointSSC, affirming the centrality of RWKV-PD to modern, resource-efficient 3D scene completion architectures (He et al., 13 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to RWKV Point Deconvolution (RWKV-PD).