Relative Energy Learning in 3D LiDAR OOD

Updated 17 November 2025

REL is a framework for 3D LiDAR OOD detection that uses energy-based modeling with a shift-invariant scoring mechanism to differentiate between inliers and anomalous points.
It integrates closed-set segmentation with a binary logistic loss on the relative energy gap, enhancing robustness in safety-critical autonomous driving systems.
The approach incorporates a geometry-aware synthetic OOD generation method (Point Raise) to augment training data and improve detection metrics on benchmarks.

Relative Energy Learning (REL) is a framework for out-of-distribution (OOD) detection in 3D LiDAR point clouds, particularly designed for safety-critical autonomous driving environments where reliable identification of rare or anomalous objects is essential. Unlike prior approaches that adapt 2D image OOD methods to 3D data without success, REL introduces a shift-invariant energy scoring mechanism and a tailored synthetic OOD data strategy, yielding robust discrimination between inlier and anomalous points.

1. Energy-Based Modeling for LiDAR OOD Detection

REL builds on energy-based models, which assess sample “confidence” by computing energy scores from neural logits. Formally, given a segmentation network with logits $f(x) = [f_1(x),\ldots,f_C(x)]^T$ for $C$ in-distribution classes:

$E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]$

where %%%%3%%%% is a temperature hyperparameter. OOD samples typically yield higher energies than inliers. Prior methods employ hinge losses with preset margins and temperature scaling, but calibration becomes tenuous under severe class imbalance and varying logit scales common in LiDAR tasks.

2. Relative Energy: Motivation, Definition, and Invariance

Relative Energy ameliorates the shift- and scale-sensitivity of raw energy scores by comparing the summed exponentials of positive (“inlier”) logits and negative (learned “OOD”) logits. The network is reparameterized to output $2K$ logits per point, with the first $K$ for in-distribution and the second $K$ for negative/OOD:

Positive free energy: $E_{pos}(x) = -\log \sum_{i \in y^+} e^{f_i(x)}$
Negative free energy: $E_{neg}(x) = -\log \sum_{i \in y^-} e^{f_i(x)}$
Relative energy gap:

$\Delta E(x) = E_{neg}(x) - E_{pos}(x) = \log \left[ \frac{\sum_{i \in y^-} e^{f_i(x)}}{\sum_{i \in y^+} e^{f_i(x)}} \right]$

This ratio is shift-invariant, reducing the need for per-scene or per-backbone calibration, and reliably separates the energy distributions of inliers and OOD points even when classes are imbalanced ( $\Delta E(x) \ll 0$ for inliers; $\Delta E(x) \gtrsim 0$ for OOD).

3. Integrated Training Objective

REL’s training combines closed-set segmentation, via the Mask4Former objective, and a binary logistic loss on the relative energy gap. The main terms are:

Mask/class loss ( $\mathcal{L}_{cls}$ ): cross-entropy and Hungarian-matched mask overlap as in Mask4Former.
REL OOD loss ( $\mathcal{L}_{REL}$ ):

$\mathcal{L}_{REL} = E_{x \sim D_{in}}[-\log \sigma(-\Delta E(x))] + \omega E_{x \sim D_{aux}}[-\log \sigma(\Delta E(x))]$

Where $D_{in}$ is the set of inliers, $D_{aux}$ is the set of synthetic OOD points from Point Raise, $\omega$ is an imbalance weight (100 typically), and $\sigma(z)$ is the sigmoid function. The combined objective is:

$\mathcal{L}_{total} = \mathcal{L}_{cls} + \lambda \cdot \mathcal{L}_{REL}$

with $\lambda=1$ by default. This encourages the relative energy gap to separate inlier and OOD populations.

4. Point Raise: Geometry-Aware Synthetic OOD Generation

The absence of annotated OOD points in training data is addressed by Point Raise, a lightweight geometry-aware synthesis algorithm. Its steps are:

Select random road points from the point cloud $P$ , given labels $L$ .
Sample a cluster within radius $r$ using KDTree( $P$ ). Compute spatial distances $\|P[cluster]\|_2$ and extract $d_{min}$ / $d_{max}$ .
Apply an inward pull: adaptive decay $a= -\log(d_{min}/d_{max}) / [\gamma (d_{max} - d_{min})]$ , then scale each point’s $xy$ coordinates by $s=\exp(-a \cdot d_{shift})$ with $d_{shift} = d(p)-d_{min}$ .
Assign random heights to cluster points: $h_j \sim U(h_{min}, h_{max})$ , updating $P[p_z]$ .
Relabel affected points as “RAISED_CLASS” (auxiliary OOD class).

Recommended hyperparameters are $r_{min}=r_{max}=0.25..0.75\,$ m, $h_{min}=h_{max}=0.25..0.75\,$ m, $\gamma=2$ . This produces compact, object-like OOD clusters that do not overlap inlier semantics.

5. Network Architecture and REL Integration

The backbone is Mask4Former-3D: a Minkowski Sparse-UNet encoder, transformer decoder using FPS-sampled queries, producing panoptic masks for K inlier classes. REL’s OOD scoring is realized as:

Auxiliary projector branch appended to every point’s encoder features
3 $\times$ $\times$ (Linear $\rightarrow$ $\to$ ReLU) layers yielding $2K$ logits
- First $K$ : positive logits $f_i(x)$ for in-distribution classes
- Next $K$ : negative logits for OOD
Relative energy gap $\Delta E(x)$ computed per point and used for OOD decision

During inference, segmentation proceeds using Mask4Former; REL assigns an OOD score via $\Delta E(x)$ for each point.

6. Empirical Performance and Ablation Results

REL has been evaluated on STU (Spotting the Unexpected) and SemanticKITTI benchmarks using both point- and object-level OOD metrics. Benchmarks and baselines include Deep Ensemble, MC Dropout, Max Logit (MSP), Void Classifier, RbA, and UEM. Results include:

Dataset / Metric	AUROC (↑)	FPR@95% (↓)	AP (↑)
STU val (REL)	97.85	9.60	10.68
STU val (UEM)	95.80	26.37	6.78
STU test (REL)	96.26	21.69	10.17
STU test (Void)	85.99	78.60	3.92
KITTI outlier (REL)	96.76	18.66	67.32
KITTI outlier (UEM)	93.15	37.07	61.73

Ablations demonstrate that REL’s relative energy yields the highest AUROC and lowest FPR@95 compared to hinge, VOS, and dual energy losses. Backbone finetuning offers further gains (AUROC: frozen $94.43$, finetuned $97.85$), while Point Raise is vital ( $\gamma=2$ best; no synthesis $90.20$ AUROC, $38.31$ FPR@95).

7. Thresholding and Deployment in Safety-Critical Systems

For operational use, REL applies a simple decision rule: classify point $x$ as OOD if $\Delta E(x) > \tau$ . A zero-centered threshold ( $\tau=0$ ) naturally splits inliers from OOD. For controlled false-positive rates (e.g., FPR@95%), $\tau$ can be empirically selected on validation data; this calibration robustly generalizes across scenes and backbones without temperature/margin adjustment. In deployment, maintaining a running histogram of inlier $\Delta E$ enables dynamic threshold adaptation to keep FPR within safety limits.

The shift invariance and universal thresholding of REL streamline the integration of OOD detection into safety-critical autonomous driving stacks, mitigating overconfident errors and enabling robust behavior under open-world uncertainty.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Relative Energy Learning (REL).