SAM2LoRA: Efficient Fundus Segmentation
- SAM2LoRA is a parameter-efficient fine-tuning strategy that integrates low-rank adaptation into SAM2’s image encoder and mask decoder to enhance domain-specific retinal segmentation.
- It employs a composite loss function that combines segmentationBCE, SoftDice, and FocalTversky losses to optimize both micro- and macro-level segmentation accuracy.
- The approach achieves state-of-the-art performance on retinal blood vessel and optic disc segmentation benchmarks while markedly reducing training overhead.
SAM2LoRA is a parameter-efficient fine-tuning strategy designed to adapt the Segment Anything Model 2 (SAM2) for domain-specific retinal fundus image segmentation tasks. By integrating low-rank adaptation (LoRA) modules into both the image encoder and mask decoder of SAM2—a model featuring a masked autoencoder-pretrained Hierarchical Vision Transformer (Hiera) architecture—SAM2LoRA enables efficient training using fewer than 5% of the original model's trainable parameters. The approach is guided by a composite loss function that combines segmentationBCE, SoftDice, and FocalTversky losses to effectively optimize both micro- and macro-level segmentation accuracy across a variety of fundus imaging benchmarks. SAM2LoRA achieves state-of-the-art performance in both blood vessel and optic disc segmentation, while offering substantial reductions in training overhead, thus enabling rapid inference and broad deployment in low-resource clinical settings (Mandal et al., 11 Oct 2025).
1. Integration of LoRA into SAM2 Architecture
SAM2LoRA enhances the SAM2 architecture by applying LoRA modules to the two critical components central to segmentation: the image encoder and the mask decoder. The image encoder is constructed as a Hierarchical Vision Transformer (Hiera) that has been pretrained with a masked autoencoder framework, allowing for multi-scale feature extraction from the input image at resolutions ranging from 1/4 to 1/32 of the full input size. SAM2LoRA inserts LoRA modules specifically into the transformer’s projection layers (query, key, value, and output), enabling efficient adaptation by modulating only low-rank subspaces of these weights. This selective adaptation circumvents the need to update the entirety of the model's parameters.
In the mask decoder, which further refines coarse segmentation maps generated by the encoder, LoRA is similarly integrated into the transformer projection layers. By targeting both the encoder and decoder, SAM2LoRA upholds the generalizable feature extraction capabilities conferred by SAM2’s pretraining, while allowing for robust domain-specific finetuning with minimal parameter updates.
2. Parameter-Efficient Fine-Tuning Strategy
SAM2LoRA is designed to mitigate the computational and overfitting challenges associated with fully finetuning large transformer models such as SAM2. Fine-tuning all parameters in such architectures is resource-intensive and can be impractical for medical tasks where dataset size is limited and class imbalance is prevalent. By leveraging LoRA, SAM2LoRA approximates optimal weight updates with low-rank matrices (of rank significantly less than the original weight dimension), so that less than 5% of the original parameters are made trainable during domain adaptation.
Only the LoRA parameters are updated during fine-tuning, while the remaining SAM2 backbone remains fixed. This allows for task-specific learning to proceed efficiently and with reduced risk of catastrophic forgetting. The dual LoRA integration into both image encoder and mask decoder is critical to achieving strong performance on diverse segmentation tasks without excessive computational burden.
3. Composite Loss Function for Fundus Segmentation
The composite loss employed by SAM2LoRA consists of three additive components, each contributing unique advantages to the optimization process:
- SegmentationBCE (Binary Cross-Entropy) Loss:
This loss measures per-pixel classification error and is effective for overall pixelwise prediction.
- SoftDice Loss:
Where is a small constant () for numerical stability. This loss particularly benefits the overlap of predicted and true segmentations, addressing the challenge of accurately segmenting thin or small structures such as blood vessels.
- FocalTversky Loss:
With the Tversky index defined as:
and with hyperparameters and governing the weighting of false positives and false negatives, and the focusing strength, respectively. This loss is designed to counter class imbalance by concentrating optimization on difficult-to-classify (minority) regions.
The total loss is a uniformly weighted sum: Use of this composite loss is shown to be “essential for optimal network tuning” in cross-dataset fundus segmentation tasks, allowing the network to balance the requirements of fine structural delineation with overall region segmentation.
4. Benchmark Evaluation and Empirical Results
SAM2LoRA was assessed across 11 retinal fundus segmentation datasets spanning both blood vessel and optic disc segmentation tasks. Notable datasets include DRIVE, STARE, CHASEDB1, HRF, FIVES (for blood vessels), and DRISHTIGS, REFUGE, G1020, GRAPE, ORIGA, PAPILA (for optic discs). The evaluation regime utilizes cross-dataset training, meaning the model is finetuned jointly for all datasets rather than individually specializing for each one.
Results indicate strong performance across metrics:
- Blood vessel segmentation: Dice scores up to 0.86, AUC up to 0.98
- Optic disc segmentation: Dice scores up to 0.93, AUC up to 0.99
In both segmentation domains, SAM2LoRA matches or outperforms multiple state-of-the-art models, despite the substantially lower training overhead. The high Dice and AUC scores under cross-dataset conditions suggest that the dual-LoRA and composite loss design yields robust generalizability and superior handling of inter-dataset variability.
5. Practical Implications and Deployment in Clinical Settings
The primary benefit of SAM2LoRA lies in its ability to render large generalist models—such as SAM2, which is based on hierarchical transformers—practical for deployment in low-resource medical environments. The requirement to tune fewer than 5% of the model’s parameters enables efficient domain adaptation and rapid inference, even on hardware lacking high-end accelerators.
Applications include:
- Automated retinal blood vessel and optic disc quantification to expedite ophthalmic diagnosis and screening workflows, notably for diabetic retinopathy and glaucoma.
- Deployment on portable edge devices for real-time fundus analysis in telemedicine and rural clinics.
- Potential adaptation to other domain-specific segmentation problems where annotated data is scarce and computational budget constraints prohibit full-model finetuning.
A plausible implication is that similar low-rank adaptation and composite loss strategies may be generalized to other medical or non-medical segmentation problems, particularly where large-scale vision foundation models are the backbone.
6. Comparative Analysis and Generalization Potential
SAM2LoRA demonstrates that careful integration of LoRA modules into deeply pretrained segmentation architectures can yield high domain-specific performance without sacrificing efficiency. Its empirical superiority in cross-dataset evaluation and sublinear increase in parameter count relative to performance improvement highlights a scalable pathway for adapting multimodal foundation models for specialized medical applications.
The approach stands in contrast to full-parameter finetuning, which is computationally prohibitive in practice. SAM2LoRA’s successful deployment in fundus imaging suggests that the core methodology—dual-modular LoRA adaptation combined with multi-objective composite loss—has potential for generalization beyond ophthalmology, pending validation on other tasks.
In summary, SAM2LoRA advances the parameter-efficient adaptation of transformer-based segmentation models for medical imaging, demonstrating that clinically relevant segmentation performance can be achieved with orders-of-magnitude reductions in training overhead and resource consumption (Mandal et al., 11 Oct 2025).