MiracleFactory Motherboard Dataset Benchmark
- The MiracleFactory Motherboard Dataset is a public benchmark comprising 389 high-resolution images and 2,860 annotated defects across 11 categories for assembly-level fault detection.
- It features a rigorous annotation protocol and deliberate class imbalance to drive the development of deep learning models for both image- and instance-level defect evaluation.
- The dataset underpins ensemble strategies, combining YOLOv7 and Faster R-CNN with Confidence-Temporal Voting to enhance detection robustness under real-world perturbations.
The MiracleFactory Motherboard Dataset is a publicly available benchmark specifically designed for assembly-level defect detection in manufactured motherboards. Distinguished by its comprehensive defect taxonomy and realistic annotations, this dataset enables the systematic evaluation and deployment of deep learning models targeting quality assurance tasks across production-scale electronics manufacturing. The dataset is referenced most notably in "BoardVision: Deployment-ready and Robust Motherboard Defect Detection with YOLO+Faster-RCNN Ensemble" (Hill et al., 16 Oct 2025), which provides formal benchmarking and practical deployment strategies. Its relevance also extends to NLP fault report analysis and comparative studies with synthesized bare board datasets.
1. Dataset Composition and Annotation Protocols
The MiracleFactory Motherboard Dataset comprises 389 high-resolution images, encompassing a total of 2,860 annotated defect instances. The dataset includes 11 defect categories that are representative of typical problems encountered in motherboard assembly, including missing screws, loose fan wiring, CPU fan port detachment, surface scratches, and other assembly-related anomalies.
Annotation is performed at the bounding-box level, capturing both the spatial extent and categorical nature of each defect instance. The dataset foregrounds technical challenges through deliberate class imbalance: common defects such as "Screws" are heavily represented, whereas critical but rarer defects like "Loose Screws" or "CPU_fan_port_detached" occur sporadically. This design incentivizes the development of models capable of fine-grained discrimination and robust recall across diverse defect types.
Image Count | Defect Instances | Number of Categories | Notable Imbalance |
---|---|---|---|
389 | 2,860 | 11 | Yes |
The dataset is intended for both image-level and instance-level detection paradigms, facilitating its use in single-image, video-sequence, and multi-instance analysis frameworks.
2. Coverage of Defect Types
Defect coverage in the MiracleFactory dataset centers on assembly-level anomalies identifiable on functional motherboards. The included defect classes span both visually apparent issues and subtle mechanical/electrical risks:
- Screws (present, missing, or loose)
- CPU fan port detachment
- Surface scratches
- Loose fan wiring
- Additional categories as annotated per image instance
Fine-grained distinctions—such as visually similar but functionally distinct screw states—are essential for verifying compliance with manufacturing specifications. The dataset's imbalance in defect frequency reflects actual manufacturing distribution and presents a benchmark for model robustness against class imbalance.
3. Benchmarking Protocols and Evaluated Detection Algorithms
The dataset is employed in benchmarking contemporary object detection models, notably YOLOv7 and Faster R-CNN, as described in (Hill et al., 16 Oct 2025). Both models are trained under controlled conditions:
- Input image size: 640×640 pixels
- Identical learning rates and batch protocols for fairness
- Label set fixed to the MiracleFactory 11-category taxonomy
YOLOv7, a one-stage detector, demonstrates superior precision but lower recall, particularly for rare defect classes. Faster R-CNN, featuring a ResNet-50 FPN backbone, achieves better recall under class imbalance but at slower inference rates. Both models are benchmarked on standard metrics, including [email protected], F1, Precision, and Recall.
4. Ensemble Framework: Confidence-Temporal Voting (CTV Voter)
To reconcile discordant strengths of the YOLOv7 and Faster R-CNN detectors, the BoardVision framework introduces the Confidence-Temporal Voting (CTV) ensemble. Detection candidates are paired across models using an IoU threshold (typically 0.3–0.5), and agreement cases are merged by weighted voting on both confidence and historical per-class F1 metrics. The fusion scores are calculated as:
where , are instance-level confidences, are validation F1 scores for class , is a confidence weighting exponent, and , are bounding boxes from YOLOv7 and Faster R-CNN, respectively.
Solo detections not matched across models are filtered according to interpretable heuristic rules based on confidence thresholds, per-class F1 advantages, and near-tie fallback logic.
5. Robustness Evaluation under Realistic Perturbations
The MiracleFactory dataset supports comprehensive robustness testing by introducing perturbations that replicate factory conditions:
- Horizontal flips representing viewpoint changes
- Brightness variations (overexposure/underexposure)
- Sharpness adjustments (Gaussian unsharp masking)
The ensemble approach stabilizes detection F1 scores under such perturbations, compensating for YOLOv7's and Faster R-CNN’s divergent sensitivities. Sensitivity analysis of CTV parameters (IoU threshold , exponent , solo thresholds) and ablation studies confirm the necessity of precise tuning for optimal defect detection.
6. Integration into Practical QA Workflows
Deployment of detection pipelines built on the MiracleFactory dataset is facilitated by a GUI-driven inspection tool implemented in PySide6/Qt, as described in (Hill et al., 16 Oct 2025). Core features:
- Synchronous visualization of outputs from YOLOv7, Faster R-CNN, and the CTV ensemble
- Interactive parameter adjustment (confidence thresholds, IoU tuning)
- Real-time overlays for operator inspection and intervention
- Support for both video stream and offline image analysis
The transparency and controllability of the tool establish a practical feedback loop between automated detection and human QA oversight, enabling efficient validation in factory settings.
7. Comparative Context and Domain-Specific Relevance
The MiracleFactory Motherboard Dataset is distinct from synthesized bare PCB benchmarks (cf. (Huang et al., 2019)), which focus on defects in naked boards via reference-comparison and DenseNet-style classification. The MiracleFactory dataset emphasizes assembly-level anomalies in PCBA—a domain characterized by the complexity of component placement, soldering, and cable routing—with both image-level and category-level imbalances that mirror authentic production-line risk profiles.
Additionally, in the context of NLP-based fault report classification (Silva et al., 20 Mar 2025), the MiracleFactory dataset serves as a focused subset used for model training and validation, specifically for motherboard-related user reports. With 47 curated samples in that pipeline, the dataset supports transformer-based NLP models in achieving up to 79% accuracy and F1 score in fault component detection under few-shot learning conditions.
In summary, the MiracleFactory Motherboard Dataset is established as a foundational resource for benchmarking assembly-level defect detection methodologies, robustness evaluation, and practical deployment in electronics manufacturing QA. Its realistic annotation protocols, class imbalance, and modular support for ensemble detection architectures make it a salient testbed for both computer vision and NLP-based fault analysis in industrial contexts.