SRA-CP: Risk-Aware Cooperative Perception
- SRA-CP is a decentralized cooperative perception framework that uses risk assessment and selective data exchange to enhance autonomous vehicle safety under bandwidth constraints.
- It employs spontaneous peer discovery and blind-zone analysis to identify occluded, high-risk objects while minimizing communication overhead.
- The framework integrates dual-attention feature fusion and adaptive bandwidth allocation, achieving near-optimal detection accuracy in safety-critical scenarios.
Spontaneous Risk-Aware Selective Cooperative Perception (SRA-CP) is a decentralized framework for connected autonomous vehicles (CVs) that achieves high perception performance for safety-critical driving tasks under strict communication constraints. SRA-CP addresses the limitations of traditional cooperative perception (CP) systems, which typically rely on fixed partner selection and indiscriminate data sharing in Vehicle-to-Vehicle (V2V) networks, by combining risk-aware activation, spontaneous peer discovery, and bandwidth-adaptive selective data exchange. The design explicitly prioritizes the perception of occluded and collision-relevant objects, using only a fraction of the bandwidth of generic CP schemes, while maintaining near-optimal detection accuracy for safety-critical situations (Liu et al., 21 Nov 2025).
1. Architectural Principles and Protocol Design
SRA-CP is structured around a decentralized broadcast protocol in which each vehicle periodically transmits a compact perceptual coverage summary—specifically, spatial mask representations of its field of view and visibility status. The typical broadcast message contains metadata (identity, pose, kinematics), a coverage mask discretized in bird’s-eye view (BEV), and is restricted to low-hundreds of bytes, ensuring minimal overhead during "routine" (non-cooperative) phases. Vehicles within communication range, , exchange these summaries to enable spontaneous and low-latency peer discovery (Liu et al., 21 Nov 2025).
The spontaneous handshake protocol proceeds as follows:
1. Periodic Broadcast: All CVs routinely share their position, velocity, and BEV visibility mask.
- Local Risk Assessment: Each CV independently constructs a "blind-zone" mask—the spatial region it cannot perceive due to occlusions—and scores the potential risk posed by unseen or ambiguously classified objects.
- Triggering and Peer Selection: Cooperation is initiated only if risk in any blind zone exceeds a threshold (), and only peers whose coverage can reduce this risk are considered. The target peer that most effectively resolves the blind spot and mitigates risk is selected.
- Selective Feature Exchange: Instead of complete sensor data, only sparse, prioritized BEV features relevant to the blind zone and risk are transmitted, bounded by a strict budget .
This dynamic protocol permits vehicles to engage in targeted cooperation immediately upon detecting local perceptual risk, rather than relying on static communication partners (Liu et al., 21 Nov 2025).
2. Perceptual Risk Identification and Occlusion Analysis
The perceptual risk identification module is central to SRA-CP's selectivity. Occlusion is modeled in the BEV plane. The per-cell occupancy confidence at cell is computed using ray-based integration over recent LiDAR sweeps. The transmittance along each ray is approximated as
and the occlusion probability at is given by
The blind-zone mask is obtained by thresholding and temporally smoothing across frames.
Each detected object in the scene is assigned a combined risk score
where is an exponential function of the spatial distance to the ego car, quantifies velocity difference, and is proximity to traffic structure (e.g., maps or lanes).
For each neighbor , the pairwise collision-risk is calculated by determining whether the neighbor can provide coverage for any highly risky object in ego’s blind zone, i.e., if and for some . Only if this score exceeds the threshold does the protocol enter its cooperative mode (Liu et al., 21 Nov 2025).
3. Selective Cooperative Perception and Feature Exchange
Selective perception is realized through a bandwidth-budgeted, targeted transmission of BEV features. The key stages are:
- Peer Filtering: Candidate partners are filtered based on both spatial coverage of the blind zone and presence of high-risk objects.
- Prioritized Masking: Each partner ranks BEV cells by spatial saliency (), risk saliency (), and blind-zone overlap, forming a gain function
Top- cells under budget are sampled, and only the corresponding feature vectors are transmitted.
- Transmission Format: Sparse cell indices, feature vectors, and binary masks for spatial/risk saliency are included, such that
where is the byte-size per cell.
This mechanism ensures that every transmitted byte maximally contributes to reducing critical risk (particularly imminent collisions or occluded vehicles) (Liu et al., 21 Nov 2025).
4. Feature Fusion, Detection, and Bandwidth Adaptation
Upon receipt, the ego CV applies a dual-attention fusion module. For each BEV cell , ego features and all incoming sparse features are projected into query, key, and value spaces. Softmax attention weights are computed across partners, resulting in a fused feature:
Optionally, the attention operation may extend to local neighborhoods around to mitigate alignment errors across vehicles.
The fused BEV representation feeds a detection head (for object class scores and boxes ) and a risk heatmap , with bandwidth usage regularized via a penalty in the overall loss:
where penalizes exceeding per-link byte budgets (Liu et al., 21 Nov 2025).
5. Evaluation, Metrics, and Empirical Findings
SRA-CP was evaluated using the OPV2V synthetic benchmark, comprising 2–7 agent scenarios and high-density LiDAR input, split into training, validation, and test sets. The main evaluation metrics used include:
- 3D Average Precision (3DAP) at multiple IoU thresholds ()
- Risk-AP: AP on objects deemed high risk (risk )
- Bandwidth usage (KB/frame) and incremental Risk-AP improvement per KB
SRA-CP demonstrated the following performance compared to notable baselines:
| Method | AP | AP | Bandwidth | Risk-AP (τ=0.4) | Bytes for AP-target |
|---|---|---|---|---|---|
| Upper Bound | 0.9057 | 0.8955 | 100% | Highest | -- |
| SRA-CP | 0.892 | 0.873 | 20% | +4–8% over spatial-only | 30–60% less than others |
| Where2Comm | 0.8902 | 0.8791 | 20% | -- | -- |
SRA-CP’s AP loss compared to Upper Bound remained under 1.5% (AP) and 2.5% (AP); for high-risk objects, it closes ~99% of the performance gap while using only a fifth of the bandwidth. Compared to selective CP baselines that do not include risk assessment (e.g., Where2Comm), SRA-CP improved safety-critical AP by ~15% (Liu et al., 21 Nov 2025).
Ablation studies verified that joint spatial-risk union gating and blind-zone weighting further improved critical object detection, particularly for AP at high risk-thresholds and low bandwidth.
6. Theoretical Context and Extension from CPoD
SRA-CP operationalizes principles previously introduced in the CPoD framework, which originally formalized cooperative perception triggering as a POMDP with risk and time-to-collision (TTC)-based activation, solved online via DESPOT. CPoD’s reward decomposed communication cost and safety-relevant use of V2V sensing, and its policy dictated CP only under risk- or TTC-defined urgency (Liu, 2024). SRA-CP advances this paradigm by:
- Generalizing "risk" beyond vehicle intentions to include direct per-object and per-zone risk modeling via local LiDAR.
- Removing the need for predefined communication partners in favor of fully spontaneous, on-the-fly peer selection.
- Introducing strict, adaptive bandwidth constraints and selective, cell-level feature sharing.
- Employing dual-attention fusion to integrate multi-agent sparse features, thereby extending prior Bayesian/cost-driven filtering with deep spatial feature selection (Liu, 2024, Liu et al., 21 Nov 2025).
The extension from CPoD to SRA-CP signifies a move from model-based, episodic policy triggering toward fully decentralized, event-driven, and bandwidth-adaptive cooperative perception with explicit risk-centric prioritization.
7. Limitations, Trade-offs, and Future Prospects
Empirical validation to date relies solely on synthetic datasets (e.g., OPV2V), and only LiDAR input is considered; this constrains generalization to real-world, multi-modal sensor conditions. The scheme requires accurate vehicle pose for warping BEV masks, which may pose challenges under GNSS-denied or noisy environments. The current approach assumes reliable communication for low-overhead broadcasts and may require adaptation for high packet loss or adversarial scenarios (Liu et al., 21 Nov 2025).
Recognized limitations include the need for real-world evaluation and integration with additional sensing modalities. The current design offers a Pareto-optimal frontier in the trade-off between safety-relevant perception and bandwidth efficiency, but future research may pursue:
- End-to-end adaptive partner selection in non-uniform and evolving V2X environments
- Multi-agent tracking extensions under partial observability with more complex risk models
- Real-time deployment in mixed-autonomy and infrastructure-assisted settings
These directions are anticipated to further enhance the robustness, safety, and communication tractability of V2V perception for autonomous systems (Liu et al., 21 Nov 2025).