GRC-Net: Geometry-Reflectance Collaboration

Updated 22 May 2026

GRC-Net is a deep learning architecture that explicitly separates geometry and reflectance using dual-branch encoders to extract invariant modality-specific features.
It employs multi-level feature collaboration, integrating local uncertainty gating and global cross-attention to robustly fuse complementary features under domain shifts.
The framework uses tailored loss functions and training strategies to disentangle representations, achieving state-of-the-art performance in LiDAR, NLOS imaging, and remote sensing.

The Geometry-Reflectance Collaboration Network (GRC-Net) is a class of deep learning architectures that explicitly disentangle and exploit the synergy between geometric structure and reflectance (appearance) information within spatial datasets—most notably LiDAR point clouds—by dual-branch separation, dedicated feature extraction, and adaptive multi-level fusion. This approach addresses the heterogeneous degradations of geometry and reflectance under domain shifts, such as those encountered in adverse weather, non-line-of-sight (NLOS) imaging, sparse multi-view reflectometry, and remote sensing. Multiple recent works develop variants of GRC-Net, each aligned with the constraints and modalities of their respective applications, but sharing the unifying paradigm of explicit geometry-reflectance collaboration (Yang et al., 3 Jun 2025, Grethen et al., 15 Jan 2026, Bi et al., 2020, Su et al., 27 Feb 2025).

1. Dual-Branch Architectures: Explicit Disentanglement

Explicit dual-branch architectures are a cornerstone of geometry-reflectance collaboration. The canonical strategy begins with modality separation at the input level, allocating distinct but coordinated encoding streams for geometry and reflectance:

LiDAR semantic segmentation: Raw point clouds $P = \{(x, y, z, r)\}$ are separated into a geometry stream $P_{geo} = [x, y, z]$ (processed as sparse 3D voxels via a 3D CNN such as MinkowskiNet), and a reflectance stream $P_{ref}$ (the 2D range projection of reflectance intensity $r$ , encoded with depth-wise 2D convolutions) (Yang et al., 3 Jun 2025).
NLOS imaging: Time-resolved transient measurements are processed into a dense 3D grid, then encoded via two distinct branches—a voxelized albedo branch for reflectance recovery, and a depth branch for geometry extraction, with each branch leveraging specialized graph neural network (GNN) modules and channel fusion (Su et al., 27 Feb 2025).
SVBRDF capture from images/DEM: In both Lunar-G2R and Deep 3D Capture, geometric input (DEM tiles for planetary surfaces or MVS-predicted depth for objects) is processed separately from multi-view or spatial reflectance features, with cross-branch feature fusion occurring later (Grethen et al., 15 Jan 2026, Bi et al., 2020).

Explicit separation prevents early contamination between geometry and reflectance under environmental perturbations, enabling each encoder to specialize in the invariant cues relevant for its modality.

2. Multi-Level Feature Collaboration and Fusion Mechanisms

After independent encoding, GRC-Nets employ carefully regulated feature fusion to maximize robustness and complementary information extraction. A representative approach is as follows (with notation as in (Yang et al., 3 Jun 2025)):

Complementarity-Aware Information Constraint (CIC): Each branch predicts parametric distributions over its encoded features. Feature vectors $f$ yield mean $\mu$ and variance $\sigma$ estimates, generating stochastic bottleneck representations via reparameterization:

$m = \mu + \sigma \odot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$

The CIC loss function

$\mathcal{L}_{cic} = \mathrm{KL}[p_{geo} \| \mathcal{N}(0,I)] + \mathrm{KL}[p_{ref} \| \mathcal{N}(0,I)] - \mathrm{KL}[p_{geo} \| p_{ref}] - \mathrm{KL}[p_{ref} \| p_{geo}]$

discourages both redundancy and noise memorization between branches.

Local Reliability Fusion: At each spatial location, a reliability gate $\alpha$ determined by branch uncertainty fuses the branch means

$P_{geo} = [x, y, z]$ 0

favoring the more reliable modality.

Global Cross-Attention Fusion: Compact learned queries cross-attend first to global reflectance features and then participate in secondary attention with voxel features, creating a robust global context embedding.

Other GRC-Net instantiations utilize similar mechanisms: multi-scale graph selection in NLOS GRC-Nets (Su et al., 27 Feb 2025), skip-connected U-Nets for geometry-reflectance fusion in DEM-based BRDF estimation (Grethen et al., 15 Jan 2026), or latent-space aggregation and per-feature max-pooling in sparse multi-view capture (Bi et al., 2020). The central objective in all designs is to cross-validate and reconcile complementary cues while suppressing modality-specific artifacts.

3. Training Objectives and Supervision Strategies

GRC-Net variants optimize for robust, disentangled representations with direct task supervision plus auxiliary consistency constraints:

Segmentation and reconstruction losses: Cross-entropy for voxel-wise semantic labels (LiDAR), L1 or L2 for albedo and depth regression (NLOS), MSE for photometric appearance (BRDF learning) (Yang et al., 3 Jun 2025, Su et al., 27 Feb 2025, Grethen et al., 15 Jan 2026).
Information-theoretic constraints: The CIC loss for branch complementarity and redundancy suppression (Yang et al., 3 Jun 2025).
Multi-stage or decoupled training: Albedo (reflectance) branches are trained first to convergence before freezing and training the depth (geometry) branch, empirically simplifying hyperparameter selection and improving convergence (Su et al., 27 Feb 2025).
Multi-scale loss application: Losses applied at each decoder scale (NLOS) or over multiple resolutions (DEM/BRDF), improving convergence and reconstruction fidelity.

No adversarial or simulation-based losses are required when explicit branch disentanglement and robust feature fusion are implemented.

4. Robustness under Domain Shifts and Adverse Sensing Conditions

A defining motivation for GRC-Net architectures is mitigating heterogeneous domain shifts, especially in outdoor or passive sensing:

Adverse weather in LiDAR: Under fog, rain, or snow, spatial structure ( $P_{geo} = [x, y, z]$ 1) shifts only modestly, whereas reflectance intensity $P_{geo} = [x, y, z]$ 2 undergoes dramatic distributional changes. GRC-Net's dual-branch design prevents noise propagation from unstable reflectance cues into geometry encoding, while allowing task-relevant reflectance features (e.g., persistent contours) to supplement segmentation as appropriate. Local and global fusion modules adaptively weight branches based on uncertainty, increasing robustness (Yang et al., 3 Jun 2025).
NLOS imaging: Direct-path illumination suppression and occlusion artifacts predominantly affect reflectance cues; separating albedo from geometry in DG-NLOS enables accurate reconstruction of both scene attributes, reducing ghosting and texture-induced depth errors (Su et al., 27 Feb 2025).
Remote/planetary sensing: Geometry-to-reflectance pipelines such as Lunar-G2R establish that fine terrain detail (slopes, curvature) is critical to reconstructing reflectance variations, particularly in the absence of direct multi-view or ground-truth material capture (Grethen et al., 15 Jan 2026).

The net effect is robust generalization to previously unseen weather, illumination, or imaging configurations.

5. Experimental Benchmarks and Performance Summary

GRC-Net models achieve state-of-the-art results across a wide variety of semantic and physical reconstruction tasks, often with simpler pipelines than competing approaches:

Method / Task	Main Metric(s)	GRC-Net/Variant Performance	Prior SOTA
LiDAR Semantic Segmentation (Yang et al., 3 Jun 2025)	mIoU (SemanticSTF)	42.5 (+3.0 over RDA)	RDA: 39.5
NLOS Imaging (Su et al., 27 Feb 2025)	PSNR / SSIM / RMSE	29.93 / 0.92 / 0.04	Next-best: 28.77
Lunar BRDF Estimation (Grethen et al., 15 Jan 2026)	MSE / PSNR / SSIM	4.81 / 23.14 / 0.520	Hapke: 7.71 / 21.02 / 0.411
Multi-View 3D Capture (Bi et al., 2020)	Normal/spec error (%)	~9–31% lower than U-Net	Best baselines

GRC-Net outperforms strong baselines (e.g., MinkNet, RDA, MIMU, Hapke, COLMAP-based pipelines) in generalization gaps, sample efficiency, and output fidelity—even as it requires only moderate increases in computational resources (e.g., 19 FPS on RTX 4090 for LiDAR segmentation). Ablation studies confirm that local/global fusion and information bottlenecking are each indispensable for peak performance.

6. Conceptual Insights and Practical Implications

Several general principles emerge from the collective body of GRC-Net work:

Modality-specific degradation necessitates explicit separation: Early fusion or joint encoding of geometry and reflectance is suboptimal under domain shifts, as it prevents isolation and suppression of modality-specific noise.
Stochastic, complementarity-aware bottlenecks enforce robustness: Penalizing branch redundancy while preserving complementary features leads to improved generalization and more disentangled representations.
Dual-level feature fusion (local reliability + global context) is essential: Local uncertainty gating enables spatially adaptive weighting of modalities, while global cross-attention recovers high-level scene semantics even when one modality is locally unreliable.
Practical retrofitting: GRC-inspired dual-stream and fusion modules can be incorporated into existing voxel-based backbones or encoder-decoder pipelines to yield immediate improvements in robustness and fidelity under adverse conditions.

A plausible implication is that these architectural motifs may generalize beyond the present applications, benefiting any setting facing modality-heterogeneous shift or requiring robust, disentangled spatial reasoning.

Several related research tracks support or extend the GRC-Net paradigm:

Graph neural representations in high-dimensional imaging: DG-NLOS demonstrates the integration of persistent graph structures for efficient 3D relation modeling, offering memory and computational advantages (Su et al., 27 Feb 2025).
Latent-space and differentiable rendering-based collaboration: Lunar-G2R and Deep 3D Capture exploit U-Net skip connections and photometrically consistent joint optimization to bring about tight geometry-reflectance alignment, without explicit attention mechanisms or adversarial discriminators (Grethen et al., 15 Jan 2026, Bi et al., 2020).
Automated branch specialization in encoder-decoder frameworks: Multi-scale, staged training with adaptive loss scheduling improves convergence and quality, supporting further research into optimization routines for dual-branch architectures.

These results reinforce the generality and utility of geometry-reflectance collaboration for robust, high-fidelity spatial and appearance reasoning in complex sensing environments.