MDCNeXt: Advanced Power Battery Detection

Updated 3 July 2026

MDCNeXt is a multi-dimensional network that integrates point, line, and count clues to precisely localize and count battery endpoints in industrial X-ray images.
It employs state-space modules (PFSSM and DRSSM) to enhance feature extraction and intra-class consistency, mitigating issues like low contrast and visual interference.
The architecture achieves state-of-the-art performance on the PBD5K benchmark, significantly outperforming existing models in key performance metrics for battery inspection.

MDCNeXt is a multi-dimensional collaborative network designed to address the power battery detection (PBD) problem, which concerns the precise localization and counting of cathode and anode plate endpoints in industrial X-ray images of electric vehicle power batteries. The key challenge of PBD lies in the dense spatial arrangement, low contrast, and presence of visual interference in X-ray imagery, limitations that have previously hindered reliable inspection using classical or conventional deep learning techniques. MDCNeXt integrates point, line, and count clues into a unified, state-space-enhanced encoder–decoder network, achieving state-of-the-art results on the PBD5K benchmark (Zhao et al., 11 Aug 2025).

1. Network Architecture and Structural Components

MDCNeXt employs an encoder–decoder structure with a ResNet-50 backbone. The network processes both task-specific “prompt” images (pure-plate exemplars) and standard input images through shared-weight feature extraction at five scales. The feature representations are refined and integrated as follows:

Prompt-Filtered State Space Module (PFSSM): At each encoder scale, dynamic filters derived from prompt images are used to suppress distractors in the input image. The filtered features are processed by a 2D state-space block (SS2D) to infuse global context.
Multi-dimensional Decoder: The decoder comprises three parallel heads:
- Point Predictor: A U-shaped sub-network generating coarse endpoint masks.
- Line Predictor: Enforces spatial continuity by learning to segment connecting line structures.
- Counting Predictor: Directly regresses total counts for anode and cathode plates.
Density-Aware Reordering State Space Module (DRSSM): Utilizing the coarse point mask, features are semantically grouped (anode, cathode, background), reordered, and processed by SS2D per class to enhance intra-class consistency, then mapped back to produce a refined segmentation mask.
Final Point Mask: The output is a highly localized single-pixel segmentation for all endpoints, supporting accurate coordinate extraction and enumeration.

2. State Space Modules and Prompt-Guided Filtering

PFSSM implements prompt-guided feature enhancement without explicit contrastive loss. A randomly chosen prompt image from the training set supplies the “pure-plate” features each epoch, encouraging invariance. The principal operations are:

Global average pooling over the deepest prompt features yields a vector $z$ .
Lightweight convolutions and SiLU activation produce softmax-normalized channel-wise attention weights $w$ .
The current image’s features at the corresponding depth are filtered via depthwise convolution and channel-wise multiplication with $w$ : $\widetilde F_C^5 = \mathrm{DWConv}_{3\times3}(F_C^5)\odot w$ .
Filtered features are processed with SS2D, with subsequent layer norm and linear mapping.

The DRSSM uses the coarse segmentation to reorder pixel features by semantic class, processes each block with SS2D for intra-class enhancement, and then performs an inverse mapping to restore spatial arrangement before final decoding.

3. Distance-Adaptive Mask Supervision

MDCNeXt introduces a distance-adaptive mask generation mechanism for label construction:

For each endpoint, the minimum Euclidean distance $d_i = \min(\|p_i - p_{i-1}\|, \|p_i - p_{i+1}\|)$ to same-polarity neighbors is computed.
A ground-truth mask is formed as a disk of radius $r_i = \alpha d_i$ , with $\alpha$ empirically selected (0.3 gives optimal performance).
The mathematical mask is $M_i = \{x \in \mathbb{R}^2 : \|x - p_i\| \leq r_i\}$ .
This adaptive strategy enables robust supervision under varying plate densities.

4. Training Protocol and Optimization

The network is trained using a composite loss:

$\mathcal{L} = \lambda_1 L_{\text{point}}^{\text{refine}} + \lambda_2 L_{\text{point}}^{\text{coarse}} + \lambda_3 L_{\text{count}} + \lambda_4 L_{\text{line}}$

with weighted IoU and BCE for segmentation heads and L1 loss for counts. Recommended weightings are $\lambda_1=\lambda_2=1.0, \lambda_3=0.05, \lambda_4=0.5$ .

Optimization employs Adam ( $w$ 0), initial learning rate 1e–4 with step decay (factor 0.9 at epoch 120), weight decay 1e–3, and gradient clipping at 0.5. Batch size is 4 across four Tesla V100 GPUs, for 150 epochs. Data augmentation consists of random horizontal flips, multi-scale resizing, and random brightness changes. Prompt randomization per epoch is utilized to mitigate overfitting to prompt exemplars.

5. Performance and Comparative Evaluation

Evaluation on the PBD5K benchmark demonstrates substantial improvements over classical and recent baselines:

Model	$w$ 1	$w$ 2
DeepLabV3+ (ECCV’18)	0.5012	0.3707
SegFormer (NeurIPS’21)	0.5161	0.3672
ZoomNeXt (PAMI’24)	0.6290	0.5328
MDCNet (CVPR’24)	0.6662	0.5811
MDCNeXt	0.7458	0.6489

On specialized PBD metrics (test split averages):

AN-ACC = 0.8705, CN-ACC = 0.8375, PN-ACC = 0.7705
AN-MAE = 0.4645, CN-MAE = 0.3005
Pixel Accuracy = 0.9912, Structure measure ( $w$ 3) = 0.7699

Ablation studies indicate:

The counting head improves anode/cathode accuracy by 9–11% over point-only models.
The line head reduces boundary MAE by approximately 20–30%.
PFSSM enhances point node accuracy by 14% and reduces over-segmentation MAE by 57%.
DRSSM further improves node accuracy (8–10%) and dramatically reduces line MAE under high-density conditions (50–70%).

6. Context and Significance in Battery Inspection

MDCNeXt enables fine-grained PBD at density and interference levels previously inaccessible to manual or classic machine vision workflows. By fusing endpoint localization, line structure, global count information, and integrating state-space modules (PFSSM, DRSSM), MDCNeXt achieves robust discrimination between closely spaced plates and resilience to common X-ray image artifacts. The distance-adaptive mask construction faithfully captures the spatial context of endpoints, critical for real-world quality assurance.

Its introduction, in tandem with the PBD5K benchmark, establishes a new reference for scalable, automated inspection of battery internals, with direct implications for electric vehicle safety and battery manufacturing process control (Zhao et al., 11 Aug 2025).

7. Benchmarking and Reproducibility

MDCNeXt is evaluated with eight PBD-specific and six standard segmentation metrics, providing comprehensive benchmarking. The source code and data are made publicly available, supporting reproducibility and comparative research development. By establishing both the MDCNeXt architecture and the PBD5K dataset, this work provides foundational tools and baselines for future advancements in power battery defect detection and related structured object localization tasks in industrial X-ray imaging.

Markdown Report Issue Upgrade to Chat

References (1)

Power Battery Detection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MDCNeXt.