Determine overlap between E-ophtha-MA and E-ophtha-EX subsets

Ascertain whether any samples overlap between the E-ophtha-MA and E-ophtha-EX subsets within the E-ophtha fundus image dataset to ensure accurate benchmarking and to prevent potential data leakage in training and evaluation pipelines.

Background

The E-ophtha dataset is divided into two subsets: E-ophtha-MA (microaneurysm annotations) and E-ophtha-EX (exudate annotations). It is frequently used for lesion-level research in diabetic retinopathy, where clear separation of training and testing data is essential.

If samples overlap across the two subsets, cross-subset evaluation or combined use could inadvertently introduce data leakage, inflating reported performance and compromising reproducibility. The paper explicitly notes that whether there is overlap is unknown, leaving an unresolved question critical for dataset hygiene and fair benchmarking.

References

Overlap of samples between E-ophtha-MA and E-ophtha-EX is unknown

Managing Diabetic Retinopathy with Deep Learning: A Data Centric Overview  (2604.02448 - Dey et al., 2 Apr 2026) in Section 3.2 (Available datasets), Table: Details of DR datasets developed during 2003–2014 (tab:pubDatasetStat_2014), E-ophtha row, Comment column