Learning to count small and clustered objects with application to bacterial colonies

Published 21 Apr 2026 in cs.CV | (2604.20030v1)

Abstract: Automated bacterial colony counting from images is an important technique to obtain data required for the development of vaccines and antibiotics. However, bacterial colonies present unique machine vision challenges that affect counting, including (1) small physical size, (2) object clustering, (3) high data annotation cost, and (4) limited cross-species generalisation. While FamNet is an established object counting technique effective for clustered objects and costly data annotation, its effectiveness for small colony sizes and cross-species generalisation remains unknown. To address the first three challenges, we propose ACFamNet, an extension of FamNet that handles small and clustered objects using a novel region of interest pooling with alignment and optimised feature engineering. To address all four challenges above, we introduce ACFamNet Pro, which augments ACFamNet with multi-head attention and residual connections, enabling dynamic weighting of objects and improved gradient flow. Experiments show that ACFamNet Pro achieves a mean normalised absolute error (MNAE) of 9.64% under 5-fold cross-validation, outperforming ACFamNet and FamNet by 2.23% and 12.71%, respectively.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper proposes two architectures, ACFamNet and ACFamNet Pro, that leverage few-shot regression and feature correlation to count small and clustered bacterial colonies.
The paper demonstrates that employing RoI Align, multi-head attention, and residual connections significantly improves counting accuracy, reducing MNAE to around 9-12%.
The paper highlights a trade-off between strong intra-domain performance and limited cross-category generalization, suggesting future work in domain adaptation.

Counting Small and Clustered Objects with Application to Bacterial Colonies: An Analysis of ACFamNet and ACFamNet Pro

Introduction

Accurate enumeration of small and densely clustered objects presents persistent challenges in automated computer vision, particularly when data are limited and cross-class generalization is required, as exemplified in bacterial colony counting. The paper "Learning to count small and clustered objects with application to bacterial colonies" (2604.20030) proposes two architectures—ACFamNet and ACFamNet Pro—as targeted extensions to the FamNet paradigm, prioritizing robust feature engineering and meta-learned regression for few-shot object counting scenarios.

Problem Context and Limitations of Existing Methods

Bacterial colony enumeration from plate images underpins quantitative microbiological assays but is complicated by object scale, variable density, frequent clusters, annotation bottlenecks, and heterogeneity across species (Figure 1). Traditional image processing (e.g., thresholding, Hough transform), classical machine learning (SVM, k-means), and general CNN-based detectors (YOLO, Mask R-CNN, U-Net) face tunability, annotation, or generalization limitations. Recent advances in density map estimation and few-shot methods (FamNet, SAFECount) address some but not all constraints, especially for small-object scenarios or cross-category adaptation.

Figure 1: Four example plate images with colonies of different species, sizes, and colours.

The data utilized (Synoptics Dataset) is characterized by significant inter-plate and inter-colony variability in count, morphology, and color (Figure 2), thus serving as a representative real-world benchmark.

Figure 2: Distribution of colony counts in the training and test sets.

Methods

Core Network Architectures

ACFamNet

ACFamNet reformulates colony counting as a few-shot regression task, substantially modifying FamNet to enhance small-object sensitivity and computational efficiency. The model leverages:

Replaceable, small-convolution feature extraction (single 7×7 kernel layer) over deep pre-trained backbones to minimize feature misalignment and computational overhead
RoI Align for region-of-interest pooling, significantly reducing quantization artifacts and information loss for small-colony exemplars
Feature correlation between exemplar-derived and query features, outputting similarity maps concatenated across multiple support scales (Figure 3, Figure 4)
Fully differentiable, end-to-end trainable pipeline (Figure 3, Figure 5)
Figure 3: Overall structure of ACFamNet. The feature correlation and regression modules are detailed in Figure 4 and Figure 5.

Figure 4: Illustration of the ACFamNet feature correlation module.

Figure 5: Illustration of ACFamNet regression module.

ACFamNet Pro

ACFamNet Pro augments ACFamNet with key enhancements for simultaneous small-object sensitivity, cluster robustness, and cross-domain potential:

Integration of a multi-head attention–inspired residual feature enhancement module (Figure 6, Figure 7), modeled after the SAFECount paradigm, allowing for dynamic focusing and improved feature fusion
Inclusion of residual connections to facilitate gradient flow and stable training (Figure 6, Figure 10)
All modules are end-to-end learnable, unlike SAFECount's reliance on frozen backbones, enabling better adaptation to target-domain specifics
Figure 6: Overall structure of ACFamNet Pro. Details of the residual feature enhancement module and the regression module are provided in Figure 7 and Figure 8.

Figure 7: Residual feature enhancement module.

Figure 8: Regression module.

Training and Evaluation Protocol

Both models are trained and evaluated using consistent strategies: 5-fold cross-validation for robust architectural hyperparameter selection, followed by retraining and performance testing on an 80/20 hold-out Synoptics Dataset split. Mean Normalized Absolute Error (MNAE) is used as the primary metric, addressing bias in gross count ranges.

Experimental Results

Performance Across Methods

ACFamNet exhibits strong performance improvements over FamNet and traditional rule-based approaches. When properly tuned (256 kernels, 3×3 RoI Align, single scale), ACFamNet achieves 11.85% MNAE on validation, outperforming FamNet by a margin of 10.48 percentage points, and OpenCFU/AutoCellSeg by over 34–56 percentage points on the test set (Table 1, Figure 9).

Figure 9: Counting results for an image with 83 colonies. ACFamNet detects 89.58 colonies.

ACFamNet Pro, incorporating multi-head attention and residual connections, yields further reduction to 9.62% MNAE on cross-validation and 11.25% on the hold-out test set (Table 2), outperforming vanilla SAFECount (13.73%), ACFamNet (12.52%), and traditional methods. The effect of architectural ablations (Table 3) demonstrates that RoI Align, learnable backbones, and residual similarity are critical to maximizing accuracy.

Model	Validation/Test MNAE (%)
ACFamNet	11.85 / 12.52
ACFamNet Pro	9.62 / 11.25
SAFECount	9.86 / 13.73
OpenCFU	— / 46.57
AutoCellSeg	— / 68.73

Table 1: MNAE performance comparison on Synoptics validation and test sets.

Visual Analysis

Predicted density maps correspond well with ground truth even for highly clustered and diverse morphologies (Figure 10, Figure 11, Figure 12). Performance on individual challenging test images underlines the robustness of the proposed approach to the composite challenges of clustering and scale.

Figure 10: ACFamNet Pro's prediction on an unseen image from validation set.

Figure 11: ACFamNet Pro's prediction on another unseen image from validation set.

Figure 12: Illustration of ACFamNet Pro's prediction. The predicted count and ground truth count are 89.5 and 83, respectively.

Cross-Category Generalisation

Despite strong intra-domain performance, both ACFamNet and ACFamNet Pro exhibit limited cross-species generalization when faced with unseen colony types, with MNAE increasing to 148.26% (ACFamNet) and 49.19% (ACFamNet Pro) on highly divergent plates (Figures 13, 23–25). SAFECount with a frozen backbone demonstrates superior cross-domain robustness (MNAE 35.83%), reinforcing the classic bias–variance tradeoff for learnable representations in small-data, high-divergence regimes.

Figure 13: Four plate images with colonies that are completely different from the Synoptics Dataset. Left to right: Plate Image A, B, C, and D.

Figure 14: Illustration of ACFamNet Pro's prediction on Plate Image A. The predicted count and ground truth count are 211.02 and 228, respectively.

Figure 15: Illustration of ACFamNet Pro's prediction on Plate Image B. The predicted count and ground truth count are 285.85 and 124, respectively.

Figure 16: Illustration of ACFamNet Pro's prediction on Plate Image C. The predicted count and ground truth count are 257.14 and 529, respectively.

Discussion

The empirical advancements of ACFamNet Pro over predecessors are attributable to architectural decisions that explicitly model small-scale, high-density, and limited-data regimes. RoI Align mitigates spatial quantization for small RoIs; the multi-head residual attention module enhances instance-level matching by contextually weighing support features. The fully differentiable design allows for data-adaptive feature learning, yet the observed decrease in inter-category generalisation suggests a degree of overfitting to intra-domain statistics, particularly when the training dataset structure does not match classic few-shot assumptions (e.g., class overlap between train and test).

The implications for practitioners are twofold:

ACFamNet Pro sets a new ceiling for small/clustered, low-data object counting in constrained scientific settings (e.g., microbiology), minimizing annotation and hardware demands.
For deployment in scenarios with highly heterogeneous object types or severe domain shift, conservative backbone-freezing or domain adaptation modules may be necessary to ensure cross-domain robustness.

Theoretically, these results highlight the importance of architecture-algorithm co-design in few-shot object counting—especially the alignment between pooling, correlation, and regression mechanisms. Further integration with more expressive, scale-adaptive backbones or explicit domain generalization modules may enhance cross-category transfer.

Conclusion

The proposed ACFamNet and ACFamNet Pro models constitute a substantial technical contribution to the long-standing problem of few-shot, clustered, small-object counting. The combination of RoI Align, end-to-end learnability, multi-head attention, and residual feature enhancement enables state-of-the-art performance on representative real-world datasets—outperforming both traditional and prior few-shot methods under several metrics. However, the cross-species generalization gap remains, pointing to essential future developments in cross-domain and open-world object counting. Potential directions include architectural ensembling (frozen and learnable), domain adaptation, and extension to higher-complexity or multimodal imaging regimes.

Markdown Report Issue