MF-PCBA Dataset for PCB Defect Detection

Updated 26 August 2025

MF-PCBA is a hierarchical dataset that organizes PCB data at pin, component, and board levels for detailed defect analysis.
It leverages advanced feature extraction from SPI systems to capture precise solder paste metrics and geometric attributes.
Data-centric pre-processing and XGBoost modeling enhance detection accuracy and quality control in PCB manufacturing.

The MF-PCBA dataset is a specialized collection designed for research and industrial applications in printed circuit board (PCB) defect detection and quality control. Its construction, feature extraction protocols, modeling approaches, and associated data-centric methodologies reflect contemporary advances in computer vision and machine learning for manufacturing diagnostics. MF-PCBA is particularly notable for its hierarchical organization and the capacity for multilevel analysis, echoing the paradigms presented in related academic work.

1. Dataset Architecture and Granularity

MF-PCBA features a hierarchical structure that supports data aggregation and modeling at multiple levels of PCB assembly:

Pin Level: Individual measurements collected for each pin provide fine-grained inspection detail. Each row records distinct physical and geometric attributes, such as solder paste volume, area, height, and positional offsets.
Component Level: Features from all pins associated with a single component are concatenated, producing composite feature vectors that enable analysis of inter-pin dependencies.
PCB Level: Complete feature sets for every component (and thus every pin) across a board are merged, supporting system-level defect prediction. Boards often comprise 128 components, arranged across panels (frequently configured as 8 PCBs/panel).

This hierarchical granularity permits investigation into defect propagation across different assembly scales, from localized faults at individual pins to systemic errors manifesting at the board level (Prasad-Rao et al., 2023).

2. Feature Extraction and Engineering

MF-PCBA incorporates state-of-the-art feature extraction, largely derived from Solder Paste Inspection (SPI) systems, as outlined in contemporary research:

Key Features:
- Solder paste volume (percentage)
- Area (µm²)
- Height (µm)
- Pad size (µm²)
- Offsets (OffsetX, OffsetY)
- Positional coordinates (PosX, PosY)
Feature Engineering: Component- and PCB-level representations are constructed by merging the corresponding pin-level features through concatenation operations, enabling the capture of inter-pin and inter-component effects not observable through isolated measurements.

The resulting dataset contains a rich array of physical and geometric parameters relevant for both single-instance and aggregated analyses of defects and deviation in manufacturing (Prasad-Rao et al., 2023).

3. Data-Centric Processing and Pre-Processing Protocols

The efficacy of models trained on MF-PCBA is strongly dependent on rigorous data pre-processing:

Data Cleaning: Correction of formats and removal of rows with NaN values are performed to ensure dataset integrity.
Instance Merging:
- Pin-Aggregation: Iterative concatenation for multi-pin components.
- Component to PCB Aggregation: Features are merged per component and subsequently per PCB, preserving assembly hierarchies.
Label Association: Defect annotation derives from Automated Optical Inspection (AOI) and operator interventions. Label merges utilize left-merge for defect instances linked unambiguously to SPI-measured pins and inner merge for broader error correction and repair labels.

These pre-processing strategies, highlighted in data-centric machine learning approaches (Prasad-Rao et al., 2023), are crucial for enabling high-quality, interpretable model outputs.

4. Modeling Strategies and Evaluation Metrics

MF-PCBA data is amenable to analytical approaches at each aggregation level, with modeling strategies that prioritize interpretability and robust performance:

Primary Algorithm: Extreme Gradient Boosting (XGBoost) is the preferred model, selected for its ability to handle tabular data and deliver strong results with minimal hyperparameter tuning.
Model Hierarchies:
- Pin-level: Predicts defects using solely SPI-extracted features and AOI classifications.
- Component-level: Utilizes aggregated pin features per component.
- PCB-level: Analyses complete board feature arrays, with per-component iterative predictions.
Model Configuration: Shallow tree depths (typically ≤12) are sufficient for strong results, aligning with findings that only a subset of solder paste features are critical for defect classification.
Evaluation Metrics: F1 score, ROC, and AUC are principal metrics, with the F1 formula defined as

$F_1 = \frac{2TP}{2TP + FP + FN}$

Performance Benchmarks: Pin-level models achieve an F1 of 0.49 for AOI detection, component-level reach 0.55. Operator-label prediction achieves F1 of 0.80, and repair-label identification reaches 0.95 (pin-level).

Model results demonstrate that multilevel aggregation and strategic use of feature sets provide superior detection capabilities, with combined model-level outputs further mitigating false negatives (Prasad-Rao et al., 2023).

5. Multi-View Data and Inference Frameworks

Integration of multi-view imaging, as demonstrated in contemporary object detection research, presents a prospective extension for MF-PCBA:

Data Acquisition: Multi-view datasets yield nine simultaneous images (center plus eight views) per PCB location, with disparity maps allowing map-based spatial alignment.
Semi-Automatic Labeling: Only 40% of center-view images require manual annotation, with labels propagated to other views via warping functions based on view disparity.
Detection Framework: Multi-view training (e.g., YOLOv5 nano) augments model robustness by leveraging contextual information from all perspectives, providing a 15% mAP improvement for components ranging 0.5–27.0 mm.
Inference: Multi-view inference (MVI) applies the trained model to all views and fuses bounding box outputs using Weighted Boxes Fusion, further increasing [email protected]:0.95 by 3.14 points for given image size.
Technical Formulation: Forward warping and “intermixing” functions ensure accurate label and prediction fusion across viewpoints, retaining spatial reliability and exploiting classifier confidence (Shamsafar et al., 2023).

A plausible implication is that MF-PCBA, if extended to multi-view imaging, could realize similar gains in efficiency and accuracy for defect detection and annotation.

6. Applications, Limitations, and Prospective Directions

MF-PCBA is engineered for automated PCB inspection, with industrial applications encompassing:

Quality Control: Early defect identification to maintain manufacturing standards.
Component Placement Verification: Ensures correct placement and soldering.
Workflow Optimization: Data-driven methodologies support operator interventions and repair protocols.
Scalability: The data-centric approach and multilevel structure facilitate application to larger datasets and new manufacturing domains.
Limitation: Current performance metrics and methodological claims are anchored to tabular data and established feature sets; integration of novel imaging paradigms (e.g., multi-view) may require further validation.

A plausible implication is that advancement of the MF-PCBA methodology through incorporation of multi-view inference, advanced fusion algorithms, and semi-automatic labeling techniques will continue to improve detection accuracy and resource efficiency. The data-centric framework and multi-level hierarchical strategies outlined in referenced research (Shamsafar et al., 2023, Prasad-Rao et al., 2023) suggest further potential for operational effectiveness and interpretable manufacturing analytics.

PDF Markdown Chat (Pro)

References (2)

Detecting Manufacturing Defects in PCBs via Data-Centric Machine Learning on Solder Paste Inspection Features (2023)

Leveraging Multi-view Data for Improved Detection Performance: An Industrial Use Case (2023)

Follow Topic

Get notified by email when new papers are published related to MF-PCBA Dataset.