RfMiD Dataset: Retinal Disease Benchmark

Updated 12 October 2025

RfMiD dataset is a curated collection of 3,200 high-resolution retinal images annotated with up to 46 disease labels, enabling comprehensive multi-pathology analysis.
It employs advanced preprocessing and extensive augmentation, including adaptive thresholding and geometric cropping, to improve image quality and model performance.
Benchmarking lightweight CNN architectures like ArConvNet and MobileNetV2 on RfMiD has demonstrated accuracies above 90% in automated retinal disease diagnosis.

The acronym “RfMiD” most commonly refers to the Retinal Fundus Multi-disease Image Dataset, a publicly available medical imaging benchmark designed primarily for retinal disease classification. However, in the RF communications literature, “RfMiD” may sometimes be used to reference a comprehensive Radio Frequency dataset for device fingerprinting, but this is not the predominant definition in ophthalmology or vision research. The following article refers to the Retinal Fundus Multi-disease Image Dataset and its significance in computational ophthalmology.

The RfMiD dataset has played a central role in enabling the development and benchmarking of lightweight and accurate deep learning models for automated diagnosis of retinal diseases, as well as serving as a base for further re-annotation and image quality assessment studies.

1. Dataset Structure and Annotation

The RfMiD dataset consists of 3,200 fundus images, each annotated with up to 46 distinct retinal disease labels. Images were captured using DSLR cameras equipped with specialized ophthalmic lenses, resulting in high-resolution images with variable dimensions; common sizes include 4288×2848, 2144×1424, and 2048×1536 pixels. Each image may be annotated with multiple disease labels, reflecting the multi-pathology and often co-morbid nature of clinical retinal presentations. This dataset was originally developed to support multi-disease classification methods, notably in the context of the RIADD challenge.

The most prevalent approach in the literature, particularly when sample sizes for certain diseases are limited, is to simplify the tasks to binary classification using the “disease risk” annotation. In this setting, all healthy retinas form one class, while any image featuring pathological findings form the other. Multi-class and multi-label approaches are also commonly benchmarked, especially for benchmarking foundational models and ensembles on multiple concurrent pathologies.

2. Preprocessing and Data Augmentation Workflow

Given the heterogeneous nature of the source images, several preprocessing steps are undertaken to enhance model performance. Adaptive thresholding is applied to create a binary mask that isolates the bright region corresponding to the actual eye fundus, discarding background or surrounding tissues. Geometric boundaries are then established from this mask, guiding a cropping operation that yields clean, centered fundus images.

Normalization is performed by min-max scaling pixel values into the range [0, 1]. To address sample scarcity and class imbalance—particularly critical for disease classes with fewer than 10 samples—extensive data augmentation is employed. Techniques include random rotations (up to 30°), horizontal and vertical shifts, shearing, zooming, intensity and contrast adjustments, and color transformations. Additionally, composite samples are generated by averaging pixel values from healthy and diseased images, increasing distributional diversity in the training set.

These preprocessing and augmentation steps are often mirrored across related datasets, such as ODIR-2019, to maintain experimental consistency and support cross-dataset transfer learning.

3. Model Architectures and Innovations Using the RfMiD Dataset

The RfMiD dataset has been used to benchmark a spectrum of lightweight convolutional neural network (CNN) architectures, with a focus on computational efficiency suitable for mobile and embedded deployment.

ArConvNet and the Accelerated Reuse Convolutional Layer

A noteworthy architectural innovation is ArConvNet, which incorporates the Accelerated Reuse Convolutional (ArConv) layer. Traditional 2D convolutions are replaced with sequential 1D depthwise convolutions applied along orthogonal dimensions, separated by an intermediate transpose. The mathematical operation is:

$y = T(D(k) \circ T(D(k)(X)))$

where $X$ is the input, $D(k)$ is a 1D depthwise convolution with kernel $k$ , and $T(\cdot)$ denotes transposition. This strategy achieves more than a 66% reduction in parameters relative to full 2D convolutions, facilitating deployment on resource-constrained hardware.

Other Lightweight Architectures

MobileNetV2 and NASNetMobile have also been extensively benchmarked. Each uses a combination of depthwise separable convolutions, bottleneck layers, and efficient downsampling to limit parameter count while sustaining accuracy. Transfer learning is a consistent theme, with models pre-trained on ImageNet and fine-tuned on the RfMiD dataset for either binary or multi-class classification.

Table: Performance Comparison on RfMiD Test Set

Model	Parameters (M)	Accuracy (%)	Precision (%)
ArConvNet	1.3	93.28	93
MobileNetV2	2.26	92.66	93
NASNetMobile	—	89.5	89.7

Values as reported in (Kasani et al., 5 Oct 2025) and (Qasim et al., 30 May 2025); “—” indicates value not reported in source.

4. Applications in AI-Assisted Ophthalmic Diagnosis

The RfMiD dataset has had a pronounced impact on the development of AI systems for early and automated detection of retinal pathologies such as Diabetic Retinopathy (DR) and Macular Hole (MH). Lightweight models, when trained and validated on RfMiD, achieve accuracies surpassing 90%, thereby supporting effective screening in clinical and point-of-care settings.

The accessibility of the dataset facilitates rapid prototyping and benchmarking, while robustness-enhancing strategies (transfer learning, augmentation, and parameter-efficient convolutions) enable generalization even under data-constrained scenarios, as often encountered in medical imaging.

The binary “disease risk” classification setup, despite its simplicity, provides a reliable metric for assessing model utility in early disease detection—a context where high sensitivity and minimal false negatives are critical for timely ophthalmic intervention.

5. Role in Advanced Research and Meta-Ensemble Benchmarking

While RfMiD remains an important resource for lightweight diagnostic models, it has also been leveraged for multi-label, multi-pathology benchmarking. For instance, in studies evaluating the generalization of models trained on large-scale synthetic datasets, such as SynFundus-1M, RfMiD serves as a standard external test set. In this context, meta-ensemble pipelines combining state-of-the-art architectures (ConvNeXtV2, SwinV2, ViT, ResNet, EfficientNetV2, and RETFound) have achieved a macro-AUC of 0.8800 on RfMiD, suggesting that high-fidelity synthetic training can transfer robustly to authentic clinical images (Cao-Xue et al., 21 Aug 2025).

This suggests an evolving role for RfMiD: beyond its original purpose as a challenge dataset, it now acts as a critical benchmark for transfer learning, domain-generalization, and ensemble modeling strategies in ophthalmic AI.

6. Context, Limitations, and Prospective Directions

Although the RfMiD dataset covers a broad array of retinal diseases, data scarcity for certain classes (frequently <10 samples) limits its effectiveness for high-resolution multi-class modeling. This has motivated both augmentation strategies and the adoption of “disease risk” simplification. The dataset has also served as the foundation for re-annotation to support auxiliary tasks such as retinal image quality assessment (e.g., RIQA-RFMiD (Xu et al., 2020)), expanding its applicability to quality-aware medical image analysis and robust AI-driven screening.

A plausible implication is that as synthetic and foundation-model-based approaches mature, the RfMiD dataset will remain an essential external validation resource, particularly valuable for elucidating model generalization capabilities and benchmarking advances in low-complexity, high-accuracy diagnostic architectures.