Repeat Factor Sampling for Object Detection
- Repeat Factor Sampling (RFS) is a data-driven strategy that mitigates class imbalance in object detection by elevating the sampling probability for images with rare classes.
- It calculates class-specific repeat factors based on image frequencies and normalizes them, ensuring underrepresented objects receive more training exposure.
- Instance-Aware and Exponentially Weighted variants enhance RFS by incorporating instance-level statistics and exponential scaling, leading to significant performance gains on long-tailed datasets.
Repeat Factor Sampling (RFS) is a data-driven rebalancing strategy principally developed for addressing class imbalance in large-scale, long-tailed object detection datasets. It has become a foundational approach for sampling-based mitigation of rare class under-representation, and has inspired a sequence of methodological advancements (notably Instance-Aware Repeat Factor Sampling and exponential-weighted variants) for practical deployment in modern object detectors. RFS additionally indexes a distinct notion in quantum recursive Fourier sampling, but the prevailing usage in machine learning and computer vision focuses on its role in training procedure design.
1. Fundamental Principles of Repeat Factor Sampling
RFS was introduced to mitigate the severe skew arising when a small set of classes dominates the majority of annotated images, while many rare categories appear in only a few images. The core idea is to increase the sampling probability of images that contain rare classes, thereby exposing the detector to more training instances of underrepresented objects.
Formally, let denote the set of all classes and the total number of training images. For each class , define the image frequency as
and fix a rarity threshold , often or . The class-specific repeat factor is then
For each image , the image-level repeat factor is
The sampling probability for image in one epoch is normalized as
An image containing any rare class () will have an elevated and hence is sampled more frequently during training (Yaman et al., 2023, Ahmed et al., 27 Mar 2025).
2. Motivation and Limitations
RFS specifically targets the imbalance in image-level class frequencies, but ignores the number of object instances per class. This can be problematic: for two rare classes occurring in the same number of images, a class with many instances (larger annotation density per image) is under-sampled relative to one with fewer instances, since RFS assigns the same regardless of total instance count (Yaman et al., 2023). This limitation results in inadequate sampling for rare but instance-rich classes, which directly impacts detection performance for these underrepresented categories.
3. Instance-Aware Extensions and Generalizations
To remedy the insensitivity of RFS to per-class instance counts, Instance-Aware Repeat Factor Sampling (IRFS) fuses image-level and instance-level statistics through a geometric mean. For each class ,
- : fraction of images containing
- : fraction of all bounding boxes labeled as
- Geometric mean:
The IRFS repeat factor:
Image-level repeat factors and probabilities are then set as for RFS, with and normalized probabilities (Yaman et al., 2023, Ahmed et al., 27 Mar 2025). This adjustment enables IRFS to properly account for jointly rare image occurrence and instance count per class, substantially improving model exposure to difficult classes.
Empirical evidence demonstrates that IRFS yields significant improvements over RFS on long-tailed datasets. On LVIS v1.0 with a Mask R-CNN baseline (ResNet-50), rare class AP increases from 9.2 (RFS) to 14.1 (IRFS), a +53% relative gain, with comparable or superior performance observed across detection architectures and ablation settings.
4. Exponentially Weighted IRFS (E-IRFS)
Despite the geometric mean fusion in IRFS, classes that are extremely rare by both image and instance frequencies may remain underexposed. Exponentially Weighted IRFS (E-IRFS) introduces exponential scaling to further amplify the repeat factors for extremely rare classes:
where is a scaling hyperparameter (often ). The exponential function achieves super-polynomial separation between rare and frequent classes, aggressively increasing the sampling weight of images containing ultra-rare objects while maintaining stability for common classes.
E-IRFS has been validated on UAV-based surveillance benchmarks (e.g., fire, smoke, human, lake detection), showing improvements of +22% in mAP (from 0.49 to 0.55) and +350% in mAP for the rarest class ("Lake") using YOLOv11-Nano models. Empirically, the benefit is even more pronounced in lightweight models with limited capacity, suggesting that stronger repeat factor scaling is crucial for effective representation learning in resource-constrained settings (Ahmed et al., 27 Mar 2025).
5. Integration and Computational Properties
RFS, IRFS, and E-IRFS are designed as sampling strategies at the data-loading stage. Their integration involves preprocessing (counting image and instance frequencies per class), calculation of per-class repeat factors, and normalization of sampling probabilities. The computational overhead is limited to the frequency computation at initialization; there is no increase in training-time compute per forward pass, nor any modification to the learning rate schedule, augmentation, or network architecture (Yaman et al., 2023).
Sampler variants function as drop-in replacements in frameworks such as MMDetection and Detectron2, with support for both two-stage and one-stage detectors. Recommended hyperparameters are (or in LVIS) and an empirically-selected for E-IRFS; excessively high values should be capped to avoid memory exhaustion in large-scale pipelines (Ahmed et al., 27 Mar 2025).
6. Comparative Empirical Analysis
A summary of performance for RFS-based methods is provided in the following table (Mask R-CNN, LVIS v1.0, ResNet-50 backbone) (Yaman et al., 2023):
| Sampling Method | mAP_bbox | AP_r (Rare) | AP_c (Common) | AP_f (Frequent) |
|---|---|---|---|---|
| None | 16.9 | 0.0 | 12.3 | 29.6 |
| RFS | 22.7 | 9.2 | 21.3 | 30.0 |
| IRFS | 24.4 | 14.1 | 22.8 | 30.7 |
Extended to the UAV surveillance domain, E-IRFS delivers additional mAP gains compared to both RFS and IRFS, particularly in the most under-represented classes and with limited-capacity network scalings (Ahmed et al., 27 Mar 2025). Moreover, IRFS demonstrates synergy with instance-level reweighting loss (e.g., Equalization Loss, ECM), producing further improvements when used in tandem.
7. Broader Applications and Theoretical Considerations
Although developed for object detection with long-tailed distributions, the RFS framework and its generalizations are applicable to any learning scenario with rare-class exposure bottlenecks. RFS methods provide a principled, architecture-agnostic mechanism for controlling class distribution at the data level. Theoretically, IRFS and E-IRFS exhibit monotonicity and convexity in their repeat factor scaling, with E-IRFS introducing exponential separation for the rare-class regime:
- Sublinear adjustment: IRFS grows
- Super-polynomial adjustment: E-IRFS grows
Selecting the geometric mean as the fusion strategy yields robust trade-offs, with ablation results indicating that geometric and harmonic means outperform arithmetic or quadratic averaging for rare class recall. Optimal threshold selection is data-dependent; excessively high or low thresholds can degrade overall performance (Yaman et al., 2023, Ahmed et al., 27 Mar 2025).
In summary, Repeat Factor Sampling (RFS) and its instance-aware and exponential generalizations are empirically and theoretically validated solutions for rare-class undersampling in long-tailed object detection, offering scalable, computationally efficient, and easily integrated capabilities that are crucial for modern vision pipelines (Yaman et al., 2023, Ahmed et al., 27 Mar 2025).