Computational modeling of full 3D OCT volumes

Develop computational methods to model entire three-dimensional optical coherence tomography (OCT) volumes holistically, rather than aggregating predictions from individual B-scans, to effectively capture inter-slice spatial structure and avoid the suboptimal performance of slice-wise aggregation approaches.

Background

Existing retinal foundation models such as RETFound primarily operate on 2D OCT slices (often the center B-scan) and can underutilize volumetric context present across the slow-scan dimension. Simple post hoc aggregation of slice-wise predictions does not fully leverage 3D structural continuity and may be suboptimal.

The paper introduces OCTCube, a 3D masked autoencoder-based framework with FlashAttention for efficiency, precisely to address the limitations of 2D-only or slice-aggregation approaches. This open problem highlights the broader methodological need for principled, effective 3D OCT volume modeling beyond naive aggregation.

References

Nevertheless, it remains unclear how to computationally model the 3D volume, as simply aggregating predictions slice-by-slice could lead to suboptimal results.

— OCTCube-M: A 3D multimodal optical coherence tomography foundation model for retinal and systemic diseases with cross-cohort and cross-device validation (2408.11227 - Liu et al., 2024) in Introduction

Computational modeling of full 3D OCT volumes

Background

References

Related Problems