CALM-Net: Curvature-Aware Vehicle Re-ID

Updated 18 October 2025

CALM-Net is a curvature-aware multi-branch neural network that processes LiDAR point clouds for robust vehicle re-identification.
It integrates edge convolution, point attention, and curvature embedding to extract complementary geometric, contextual, and surface variation features.
Empirical evaluation on nuScenes data shows a 1.97% improvement in mean re-identification accuracy over strong baselines, supporting real-time autonomous applications.

CALM-Net refers to a curvature-aware LiDAR point cloud-based multi-branch neural network designed for vehicle re-identification in three-dimensional point cloud data. It integrates complementary geometric, contextual, and surface variation features through specialized architectural components—edge convolution, point attention, and curvature embedding—to enhance the discriminative power of deep representations for distinguishing vehicles in large-scale datasets such as nuScenes. Empirical studies demonstrate that CALM-Net achieves a roughly 1.97 percentage point improvement in mean re-identification accuracy over strong baseline architectures. The design highlights the value of explicitly encoding local surface curvature information in point cloud models for robust vehicle identity matching across varying views and sparsity regimes (Lee et al., 16 Oct 2025).

1. Multi-Branch Architecture for Point Cloud Representation

CALM-Net adopts a multi-branch architecture explicitly constructed to extract and aggregate discrete but complementary features from raw LiDAR point clouds:

Edge Convolution (EC) Branch: Models local geometric context. For each point $x_i$ , its $k$ -nearest neighbors $\mathcal{N}(i)$ are identified. The edge feature is computed as:

$h_{\theta}(x_i, x_j) = \mathrm{ReLU}(\theta \cdot (x_j - x_i) + \phi \cdot x_i)$

Aggregation is performed via max pooling:

$\text{EC}(x_i) = \max_{x_j \in \mathcal{N}(i)} h_{\theta}(x_i, x_j)$

where $\theta, \phi$ are learned weights. This stream is sensitive to local topology and micro-structural differences.

Point Attention (PA) Branch: Implements global contextual reasoning in the spirit of attention mechanisms found in Vision Transformers. Input features $X$ are linearly projected into queries ( $Q$ ), keys ( $K$ ), and values ( $V$ ):

$Q = XW_Q, \quad K = XW_K, \quad V = XW_V$

Attention is computed as:

$\alpha_{ij} = \frac{\exp(Q_i K_j^T/\sqrt{d})}{\sum_l \exp(Q_i K_l^T/\sqrt{d})}$

and the contextualized output:

$\text{PA}(x_i) = \sum_j \alpha_{ij} V_j$

This branch enables modeling of long-range dependencies within the point cloud.

Curvature Embedding Branch: Quantifies and encodes local surface variation. For each point, calculate the covariance matrix of its $k$ -nearest neighborhood:

$x_i^c = \frac{1}{k} \sum_{x_j \in \mathcal{X}(i)} x_j$

$M_i = \frac{1}{k} \sum_{x_j \in \mathcal{X}(i)} (x_j - x_i^c)(x_j - x_i^c)^T$

Eigen decomposition yields $\Lambda_i = \operatorname{diag}(\lambda_1, \lambda_2, \lambda_3)$ which encapsulate local patch geometry. The embedding module is:

$\mathrm{CurvEmbed}(\Lambda) = \phi_2(\mathrm{ReLU}(\phi_1([\lambda_1, \lambda_2, \lambda_3])))$

After computing features from each stream, the respective representations are concatenated and passed through subsequent convolutional and batch normalization layers:

$B_1(X) = \mathrm{MLP_{conv}}(\mathrm{PA}(X) \oplus \mathrm{EC}(X))$

$B_2(X) = \mathrm{BN}(\mathrm{Conv}(B_1(X) \oplus \mathrm{CurvEmbed}(\Lambda)))$

$\mathrm{CALM\mbox{-}Net}(X) = \mathrm{ReLU}(B_2(X))$

where $\oplus$ denotes concatenation.

2. Role and Implementation of Curvature Embedding

Curvature embedding is central to CALM-Net’s discriminative capacity. By moving beyond raw (x, y, z) coordinate processing, CALM-Net leverages the principal eigenvalues of neighborhood covariances to encode deviations from local planarity:

Flat surfaces yield one large and two near-zero eigenvalues.
Edges or ridges manifest as two significant and one small eigenvalue.
Highly curved regions have three strong eigenvalues.

This spectral encoding via eigenvectors provides invariance to rotations/viewpoints and robustness to sparsity, allowing the network to distinguish vehicles with subtle geometric cues. The encoded curvature vector is mapped non-linearly to a learned feature space, yielding substantial gains in re-identification accuracy, especially among classes with similar gross shape but varying micro-structure.

3. Experimental Evaluation and Quantitative Results

CALM-Net was benchmarked on a nuScenes-derived vehicle re-identification dataset:

Only annotated frames with at least 127 points each were considered.
Both rigid (e.g., car, truck, bus, trailer) and deformable (e.g., motorcycle, pedestrian) object classes were evaluated using a pairwise matching protocol and metrics such as mean accuracy (mAcc), F1 positive, and F1 negative scores.

Key findings include:

Method	Mean Acc.	F1 Pos.	F1 Neg.	Inference Time (256 pts, ms)
PointNet	91.54	90.64	97.62	20–21
PointNeXt	94.91	94.12	98.00	27–29
DGCNN	92.41	91.18	97.41	58–59
DeepGCN	93.67	93.02	97.81	52–55
Point Transformer	94.16	93.49	98.65	29–32
CALM-Net	95.74	95.28	98.89	23–24

Hybrid point subsampling (random during training, FPS at inference) was used for best accuracy.
Rigid objects benefitted most from curvature embedding; performance on deformable classes remained lower.

Ablation studies confirmed that each architectural branch—EC, PA, and curvature embedding—contributed distinctly, with their combination yielding the highest accuracy.

4. Mathematical Formulation Details

The key mathematical operations of CALM-Net include:

Covariance Eigenanalysis for Curvature:

$M_i = \frac{1}{k} \sum_{x_j \in \mathcal{X}(i)} (x_j - x_i^c) (x_j - x_i^c)^\top$

$M_i = V_i \Lambda_i V_i^\top,\quad \Lambda_i = \mathrm{diag}(\lambda_1, \lambda_2, \lambda_3)$

Edge Convolution:

$\text{EC}(x_i) = \max_{x_j \in \mathcal{N}(i)} \mathrm{ReLU}(\theta \cdot (x_j-x_i) + \phi \cdot x_i)$

Point Attention:

$\alpha_{ij} = \frac{\exp(Q_i K_j^\top / \sqrt{d})}{\sum_l \exp(Q_i K_l^\top / \sqrt{d})},\quad \text{PA}(x_i) = \sum_j \alpha_{ij} V_j$

Aggregation:

$B_2(X) = \mathrm{BN}(\mathrm{Conv}(B_1(X) \oplus \mathrm{CurvEmbed}(\Lambda)))$

$\mathrm{CALM\mbox{-}Net}(X) = \mathrm{ReLU}(B_2(X))$

5. Application Prospects and Implications

The design and empirical efficacy of CALM-Net indicate several directions for application and further research:

Real-time Automotive Systems: CALM-Net operates at $\sim$ 23–24 ms/frame (256 points), enabling deployment in latency-sensitive autonomous driving and intelligent surveillance.
Robust Multi-object Tracking: The integrated features support reliable association of vehicles under changing viewpoints, partial occlusions, and variable LiDAR returns, thus enhancing multi-camera/sensor tracking frameworks.
3D Geometric Reasoning: The explicit curvature branch provides a template for future 3D models requiring local surface analysis, with potential extensions for non-rigid/deformable object reasoning or fusion with camera/radar modalities.
Improving Re-identification for Deformable Classes: Results suggest the need for specialized adaptations to achieve similar gains for motorcycles, bicycles, and pedestrians.

6. Comparison with Baseline Methods

Model	Curvature Embedding	mAcc (%)	Relative Gain
PointNet	✗	91.54	–
PointNeXt	✗	94.91	–
DGCNN	✗	92.41	–
DeepGCN	✗	93.67	–
Point Transformer	✗	94.16	–
CALM-Net	✓	95.74	+1.97

These quantitative comparisons underscore that CALM-Net’s combination of multi-branch feature learning and explicit curvature encoding extracts discriminative and robust features not captured by prior architectures.

7. Future Directions

Extensions and open research avenues include:

Refining the curvature embedding for better expressivity, possibly leveraging higher-order local statistics.
Addressing disparities in performance for deformable versus rigid classes by integrating multi-scale encoding or adaptive modules.
Exploring multimodal fusion (e.g., with RGB or radar) using the CALM-Net framework for unified scene understanding.
Systematic exploration of architectural trade-offs between computational complexity and representational power for larger-scale deployment.

CALM-Net exemplifies the trend toward explicit geometric encoding merged with attention-based contextual processing in 3D vision, offering a robust foundation for next-generation vehicle re-identification and tracking in autonomous systems (Lee et al., 16 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

CALM-Net: Curvature-Aware LiDAR Point Cloud-based Multi-Branch Neural Network for Vehicle Re-Identification (2025)

Follow Topic

Get notified by email when new papers are published related to CALM-Net.