- The paper proposes two architectures, ACFamNet and ACFamNet Pro, that leverage few-shot regression and feature correlation to count small and clustered bacterial colonies.
- The paper demonstrates that employing RoI Align, multi-head attention, and residual connections significantly improves counting accuracy, reducing MNAE to around 9-12%.
- The paper highlights a trade-off between strong intra-domain performance and limited cross-category generalization, suggesting future work in domain adaptation.
Counting Small and Clustered Objects with Application to Bacterial Colonies: An Analysis of ACFamNet and ACFamNet Pro
Introduction
Accurate enumeration of small and densely clustered objects presents persistent challenges in automated computer vision, particularly when data are limited and cross-class generalization is required, as exemplified in bacterial colony counting. The paper "Learning to count small and clustered objects with application to bacterial colonies" (2604.20030) proposes two architectures—ACFamNet and ACFamNet Pro—as targeted extensions to the FamNet paradigm, prioritizing robust feature engineering and meta-learned regression for few-shot object counting scenarios.
Problem Context and Limitations of Existing Methods
Bacterial colony enumeration from plate images underpins quantitative microbiological assays but is complicated by object scale, variable density, frequent clusters, annotation bottlenecks, and heterogeneity across species (Figure 1). Traditional image processing (e.g., thresholding, Hough transform), classical machine learning (SVM, k-means), and general CNN-based detectors (YOLO, Mask R-CNN, U-Net) face tunability, annotation, or generalization limitations. Recent advances in density map estimation and few-shot methods (FamNet, SAFECount) address some but not all constraints, especially for small-object scenarios or cross-category adaptation.



Figure 1: Four example plate images with colonies of different species, sizes, and colours.
The data utilized (Synoptics Dataset) is characterized by significant inter-plate and inter-colony variability in count, morphology, and color (Figure 2), thus serving as a representative real-world benchmark.
Figure 2: Distribution of colony counts in the training and test sets.
Methods
Core Network Architectures
ACFamNet
ACFamNet reformulates colony counting as a few-shot regression task, substantially modifying FamNet to enhance small-object sensitivity and computational efficiency. The model leverages:
- Replaceable, small-convolution feature extraction (single 7×7 kernel layer) over deep pre-trained backbones to minimize feature misalignment and computational overhead
- RoI Align for region-of-interest pooling, significantly reducing quantization artifacts and information loss for small-colony exemplars
- Feature correlation between exemplar-derived and query features, outputting similarity maps concatenated across multiple support scales (Figure 3, Figure 4)
- Fully differentiable, end-to-end trainable pipeline (Figure 3, Figure 5)
Figure 3: Overall structure of ACFamNet. The feature correlation and regression modules are detailed in Figure 4 and Figure 5.
Figure 4: Illustration of the ACFamNet feature correlation module.
Figure 5: Illustration of ACFamNet regression module.
ACFamNet Pro
ACFamNet Pro augments ACFamNet with key enhancements for simultaneous small-object sensitivity, cluster robustness, and cross-domain potential:
- Integration of a multi-head attention–inspired residual feature enhancement module (Figure 6, Figure 7), modeled after the SAFECount paradigm, allowing for dynamic focusing and improved feature fusion
- Inclusion of residual connections to facilitate gradient flow and stable training (Figure 6, Figure 10)
- All modules are end-to-end learnable, unlike SAFECount's reliance on frozen backbones, enabling better adaptation to target-domain specifics
Figure 6: Overall structure of ACFamNet Pro. Details of the residual feature enhancement module and the regression module are provided in Figure 7 and Figure 8.
Figure 7: Residual feature enhancement module.
Figure 8: Regression module.
Training and Evaluation Protocol
Both models are trained and evaluated using consistent strategies: 5-fold cross-validation for robust architectural hyperparameter selection, followed by retraining and performance testing on an 80/20 hold-out Synoptics Dataset split. Mean Normalized Absolute Error (MNAE) is used as the primary metric, addressing bias in gross count ranges.
Experimental Results
ACFamNet exhibits strong performance improvements over FamNet and traditional rule-based approaches. When properly tuned (256 kernels, 3×3 RoI Align, single scale), ACFamNet achieves 11.85% MNAE on validation, outperforming FamNet by a margin of 10.48 percentage points, and OpenCFU/AutoCellSeg by over 34–56 percentage points on the test set (Table 1, Figure 9).
Figure 9: Counting results for an image with 83 colonies. ACFamNet detects 89.58 colonies.
ACFamNet Pro, incorporating multi-head attention and residual connections, yields further reduction to 9.62% MNAE on cross-validation and 11.25% on the hold-out test set (Table 2), outperforming vanilla SAFECount (13.73%), ACFamNet (12.52%), and traditional methods. The effect of architectural ablations (Table 3) demonstrates that RoI Align, learnable backbones, and residual similarity are critical to maximizing accuracy.
| Model |
Validation/Test MNAE (%) |
| ACFamNet |
11.85 / 12.52 |
| ACFamNet Pro |
9.62 / 11.25 |
| SAFECount |
9.86 / 13.73 |
| OpenCFU |
— / 46.57 |
| AutoCellSeg |
— / 68.73 |
Table 1: MNAE performance comparison on Synoptics validation and test sets.
Visual Analysis
Predicted density maps correspond well with ground truth even for highly clustered and diverse morphologies (Figure 10, Figure 11, Figure 12). Performance on individual challenging test images underlines the robustness of the proposed approach to the composite challenges of clustering and scale.
Figure 10: ACFamNet Pro's prediction on an unseen image from validation set.
Figure 11: ACFamNet Pro's prediction on another unseen image from validation set.
Figure 12: Illustration of ACFamNet Pro's prediction. The predicted count and ground truth count are 89.5 and 83, respectively.
Cross-Category Generalisation
Despite strong intra-domain performance, both ACFamNet and ACFamNet Pro exhibit limited cross-species generalization when faced with unseen colony types, with MNAE increasing to 148.26% (ACFamNet) and 49.19% (ACFamNet Pro) on highly divergent plates (Figures 13, 23–25). SAFECount with a frozen backbone demonstrates superior cross-domain robustness (MNAE 35.83%), reinforcing the classic bias–variance tradeoff for learnable representations in small-data, high-divergence regimes.



Figure 13: Four plate images with colonies that are completely different from the Synoptics Dataset. Left to right: Plate Image A, B, C, and D.
Figure 14: Illustration of ACFamNet Pro's prediction on Plate Image A. The predicted count and ground truth count are 211.02 and 228, respectively.
Figure 15: Illustration of ACFamNet Pro's prediction on Plate Image B. The predicted count and ground truth count are 285.85 and 124, respectively.
Figure 16: Illustration of ACFamNet Pro's prediction on Plate Image C. The predicted count and ground truth count are 257.14 and 529, respectively.
Discussion
The empirical advancements of ACFamNet Pro over predecessors are attributable to architectural decisions that explicitly model small-scale, high-density, and limited-data regimes. RoI Align mitigates spatial quantization for small RoIs; the multi-head residual attention module enhances instance-level matching by contextually weighing support features. The fully differentiable design allows for data-adaptive feature learning, yet the observed decrease in inter-category generalisation suggests a degree of overfitting to intra-domain statistics, particularly when the training dataset structure does not match classic few-shot assumptions (e.g., class overlap between train and test).
The implications for practitioners are twofold:
- ACFamNet Pro sets a new ceiling for small/clustered, low-data object counting in constrained scientific settings (e.g., microbiology), minimizing annotation and hardware demands.
- For deployment in scenarios with highly heterogeneous object types or severe domain shift, conservative backbone-freezing or domain adaptation modules may be necessary to ensure cross-domain robustness.
Theoretically, these results highlight the importance of architecture-algorithm co-design in few-shot object counting—especially the alignment between pooling, correlation, and regression mechanisms. Further integration with more expressive, scale-adaptive backbones or explicit domain generalization modules may enhance cross-category transfer.
Conclusion
The proposed ACFamNet and ACFamNet Pro models constitute a substantial technical contribution to the long-standing problem of few-shot, clustered, small-object counting. The combination of RoI Align, end-to-end learnability, multi-head attention, and residual feature enhancement enables state-of-the-art performance on representative real-world datasets—outperforming both traditional and prior few-shot methods under several metrics. However, the cross-species generalization gap remains, pointing to essential future developments in cross-domain and open-world object counting. Potential directions include architectural ensembling (frozen and learnable), domain adaptation, and extension to higher-complexity or multimodal imaging regimes.