Papers
Topics
Authors
Recent
Search
2000 character limit reached

PBD5K: Annotated Dataset for Battery Inspection

Updated 3 July 2026
  • PBD5K is a large-scale annotated dataset featuring 5,000 high-resolution X-ray images with expert-verified pinpoint labels for battery endpoint detection.
  • The dataset uses a four-stage annotation pipeline with active learning that significantly reduces annotation time and spatial error.
  • MDCNeXt, the benchmark model, achieves sub-1 pixel localization accuracy by integrating multi-dimensional clue extraction and adaptive mask generation.

PBD5K is a large-scale annotated dataset tailored for power battery detection (PBD), focusing on the localization of densely packed anode and cathode plate endpoints in X-ray imagery of battery cells for industrial quality inspection. PBD5K establishes the primary benchmark for this problem, offering high-resolution, expert-verified annotations, comprehensive coverage of real-world visual interference, and supports both algorithmic evaluation and annotation research (Zhao et al., 11 Aug 2025).

1. Dataset Composition and Annotation Pipeline

PBD5K comprises 5,000 X-ray images of assembled power battery cells, including devices from nine battery types and spanning three imaging magnification regimes (Close, Medium, Long Shot). The dataset is partitioned into 3,000 training and 2,000 testing images, with the test set further subdivided into Regular (515 images), Difficult (677 images), and Tough (808 images) according to interference characteristics and plate density.

Each plate endpoint is labeled as a single-pixel point at integer image coordinates. For model training, these are dilated into small binary masks, using a distance-adaptive scheme. The annotation process is optimized with a four-stage pipeline, summarized below:

Metric Manual Intelligent Relative Improvement
Avg. annotation time (min/image) 3.4 1.2 +64.7%
Spatial deviation (mean px) 3.5 0.9 +74.3%
Proportion of disputed samples 22% 13% +40.9%
Rework rate 17% 6% +64.7%

Pipeline stages include:

  • Automated screening (battery integrity, near-duplicate removal via 512-D VGG+FAISS),
  • Multi-expert pre-labeling and cross-verification (three experts, post-hoc voting as needed),
  • Model-assisted active learning loop (uncertainty-driven expert review),
  • Layered quality evaluation (consistency and spatial accuracy checks).

Eight types of real-world visual disturbances, relevant for industrial settings, are included: tilted plates, aberrant plate structures, illumination interference, plate interference from neighboring cells, bifurcation, tray and tab interference, separator visibility, and the pure-plate baseline.

2. Task Formulation and Supervision Strategies

The PBD task is formulated as dense, point-level binary segmentation. Input images IRH×W×3I \in\mathbb{R}^{H\times W\times 3} are mapped to two binary endpoint masks Y^an,Y^ca[0,1]H×W\hat{Y}^{an}, \hat{Y}^{ca} \in [0,1]^{H\times W}, marking anode and cathode plate endpoints respectively.

Supervision employs:

  • Weighted BCE loss and IoU loss for both coarse and fine endpoint predictions,
  • Additional losses for line characteristics (Lline\mathcal{L}_{\text{line}}) and total plate counting (Lcount=NN^1\mathcal{L}_{\text{count}}=\|N-\hat N\|_1),
  • Total loss:

Ltotal=λ1Lpointrefine+λ2Lpointcoarse+λ3Lcount+λ4Lline\mathcal{L}_{\text{total}} = \lambda_1 \mathcal{L}^{\mathrm{refine}}_{\text{point}} + \lambda_2 \mathcal{L}^{\mathrm{coarse}}_{\text{point}} + \lambda_3 \mathcal{L}_{\text{count}} + \lambda_4 \mathcal{L}_{\text{line}}

with weights λ1=λ2=1\lambda_1=\lambda_2=1, λ3=0.05\lambda_3=0.05, λ4=0.5\lambda_4=0.5.

Positive supervision masks are generated adaptively: for each endpoint, the mask radius is rj=αdjr_j = \alpha d_j, with djd_j the inter-endpoint distance and Y^an,Y^ca[0,1]H×W\hat{Y}^{an}, \hat{Y}^{ca} \in [0,1]^{H\times W}0 found optimal for precision-recall.

3. Real-World Visual Interference and Subset Partitioning

PBD5K exhaustively documents all major categories of visual degradation encountered in industrial X-ray contexts. The interference types (P, T, A, II, PI, BI, TRI, TAI, SI) capture deformations, illumination variance, occlusion effects, electronic component overlap, and structural artifacts such as tray and separator shadows.

Robust evaluation is facilitated by partitioning the test data according to both the presence and severity of such interference, with the Tough subset representing high plate counts and compound disturbances that severely challenge both human annotators and automated detectors.

4. MDCNeXt Model and Multi-Dimensional Clue Extraction

To serve as a benchmark reference, MDCNeXt is introduced for structure-aware endpoint segmentation. The model uses a shared ResNet-50 encoder and multiple specialized modules:

  • Prompt-Filtered State-Space Module (PFSSM): At each encoding stage, a “pure-plate” prompt generates dynamic depth-wise filters, guiding the focus of the feature extraction towards task-relevant contrastive features through softmax-weighted convolutions, followed by state-space modeling (VMamba block).
  • Density-Aware Reordering State-Space Module (DRSSM): This module semantically groups dense regions (anode/cathode/background) and processes them separately via SS2D modeling to improve intra-class coherence and spatial discrimination, especially in densely packed plate arrays.
  • Multi-Dimensional Clue Modules: Coarse point prediction (FPN-style upsampling), a line details predictor relying on low-level fused features and element-wise gating, and a global plate counter applied on deep feature map heads.

Auxiliary loss branches enforce explicit plate count and line structure constraints. The overall approach enables integration of point-wise, linear, and count-level information within a unified architecture.

5. Experimental Outcomes, Metrics, and Ablation

Evaluation employs a comprehensive set of eight metrics, including MAE and accuracy for anode/cathode endpoints, paired plate accuracy, along with position and overhang MAE.

MDCNeXt achieves the leading results across the board. The following table presents a comparison to the prior MDCNet baseline averaged over all test splits:

Method AN-MAE↓ CN-MAE↓ AN-ACC↑ CN-ACC↑ AL-MAE↓ CL-MAE↓ OH-MAE↓
MDCNet 3.275 2.883 0.739 0.714 4.843 4.531 3.773
MDCNeXt 0.465 0.301 0.871 0.838 1.262 1.171 1.067

Ablation studies indicate cumulative benefit from each architectural component:

Added Component AN-MAE↓ AN-ACC↑ AL-MAE↓ OH-MAE↓
Baseline (point only) 4.523 0.489 7.953 4.993
+ Counting predictor (CP) 3.468 0.565 7.527 4.969
+ Line predictor (LP) 3.135 0.604 5.881 3.772
+ Prompt stream 2.294 0.694 5.291 3.746
+ PFSSM 0.944 0.791 2.399 1.626
+ DRSSM (full MDCNeXt) 0.465 0.871 1.262 1.067

Model efficiency is documented with parameter count and GFLOPs benchmarking on Y^an,Y^ca[0,1]H×W\hat{Y}^{an}, \hat{Y}^{ca} \in [0,1]^{H\times W}1 images:

Method Params (MB)↓ GFLOPs↓
SAM 2 216.4 128.4
SegGPT 370.3 655.7
DeepLabV3+ 58.9 121.5
SegFormer 84.8 372.5
Spider 394.2 171.4
ZoomNeXt 28.5 150.0
MDCNet 49.5 27.6
MDCNeXt 41.5 73.2

Qualitative assessments demonstrate that, even on the “Tough” subset with >70 plates, MDCNeXt yields sub-1 px endpoint localization and reliable separation despite heavy interference, while alternatives typically miss detections, produce duplications, or display substantial drift.

6. Technical and Practical Significance

PBD5K establishes a rigorous standard for supervised power battery inspection, addressing the unique challenges of high-density, fine-grained endpoint localization in noisy, artifact-heavy X-ray images. The annotation protocol and explicit documentation of real-world disturbances provide a robust resource for vision model benchmarking and data-centric annotation research.

MDCNeXt demonstrates the impact of incorporating contrastive prompt-filtering and spatial group-aware state-space modeling in industrial X-ray image analysis. Its performance suggests that explicit integration of multi-dimensional clues and adaptive mask generation is essential for dense-object inspection tasks under adverse industrial conditions.

PBD5K and MDCNeXt thus underpin both future methodological advances in power battery visual inspection and broader applications requiring dense, interference-robust point-level localization (Zhao et al., 11 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PBD5K.