Influence of architecture, loss function, and training protocol with large datasets

Ascertain which choices among neural network architecture, loss function, and training protocol materially affect retinal blood vessel segmentation performance when training on a large annotated fundus image dataset.

Background

A wide range of architectural variants (e.g., UNet variants with attention or cascades) and loss functions (e.g., BCE, SoftDice, DiceBCE, clDice) are used in the literature, often evaluated on small datasets that may not reveal which components actually drive performance.

The availability of a larger dataset such as FIVES raises the practical question of which design and training choices truly matter when sufficient data are available—an uncertainty explicitly identified by the authors.

References

Given the wide range of choices for architectures, loss functions, or training protocols, it is also unclear which of these factors actually matter when a large dataset is used for training.