Med-URWKV: Pure RWKV With ImageNet Pre-training For Medical Image Segmentation (2506.10858v1)

Published 12 Jun 2025 in eess.IV and cs.CV

Abstract: Medical image segmentation is a fundamental and key technology in computer-aided diagnosis and treatment. Previous methods can be broadly classified into three categories: convolutional neural network (CNN) based, Transformer based, and hybrid architectures that combine both. However, each of them has its own limitations, such as restricted receptive fields in CNNs or the computational overhead caused by the quadratic complexity of Transformers. Recently, the Receptance Weighted Key Value (RWKV) model has emerged as a promising alternative for various vision tasks, offering strong long-range modeling capabilities with linear computational complexity. Some studies have also adapted RWKV to medical image segmentation tasks, achieving competitive performance. However, most of these studies focus on modifications to the Vision-RWKV (VRWKV) mechanism and train models from scratch, without exploring the potential advantages of leveraging pre-trained VRWKV models for medical image segmentation tasks. In this paper, we propose Med-URWKV, a pure RWKV-based architecture built upon the U-Net framework, which incorporates ImageNet-based pretraining to further explore the potential of RWKV in medical image segmentation tasks. To the best of our knowledge, Med-URWKV is the first pure RWKV segmentation model in the medical field that can directly reuse a large-scale pre-trained VRWKV encoder. Experimental results on seven datasets demonstrate that Med-URWKV achieves comparable or even superior segmentation performance compared to other carefully optimized RWKV models trained from scratch. This validates the effectiveness of using a pretrained VRWKV encoder in enhancing model performance. The codes will be released.

Authors (1)

Zhenhuan Zhou (4 papers)

Summary

Med-URWKV: Advancements in Medical Image Segmentation with RWKV and ImageNet Pre-training

This paper presents Med-URWKV, a novel approach in the field of medical image segmentation. The authors introduce a unique model architecture that leverages the Receptance Weighted Key Value (RWKV) framework, integrating it with ImageNet-based pre-training for enhanced performance. The paper delineates Med-URWKV as the first pure RWKV segmentation model to capitalize on pre-trained Vision RWKV (VRWKV) encoders, contributing significantly to the medical imaging field.

Med-URWKV stands on the foundation of multiple paradigms in machine learning-based medical image segmentation: CNNs, Transformers, and hybrid architectures. Each of these paradigms brings unique strengths and inherent limitations, such as CNNs' restricted receptive fields or Transformers' quadratic complexity. RWKV emerges as a promising alternative with its linear computational complexity and ability to model long-range dependencies effectively.

The architecture of Med-URWKV builds on the U-Net framework, incorporating a pure RWKV-driven methodology within the segmentation domain. The proposed method is designed to optimize RWKV's capabilities by reusing large-scale pre-trained VRWKV encoders—a departure from the traditional practice of training from scratch. The pre-trained VRWKV encoder enhances segmentation accuracy and expedites model convergence without inflating computational resources. The findings suggest that Med-URWKV either meets or exceeds the performance of existing RWKV-based models trained without pre-training, across an array of seven datasets, highlighting its comparative strength.

The paper further details the Med-URWKV architecture and its components, including a VRWKV encoder, a dedicated VRWKV decoder, and a RWKV bottleneck block. This design is tailored to harness computational efficiency and performance, utilizing hierarchical features in conjunction with skip connections to achieve robust segmentation results.

Empirical validation covers a diverse set of seven datasets, confirming the superiority of Med-URWKV in terms of Dice Similarity Coefficient (DSC) and Intersection over Union (IoU) scores. Notably, Med-URWKV outperforms competitors while maintaining fewer parameters, a testament to the efficacy of the integrated pre-trained VRWKV encoder. This demonstrates the significance and transformative potential of adopting pre-trained RWKV architectures within medical image segmentation tasks, underscoring practical and theoretical implications.

The discussion points to potential future research avenues, such as exploring diverse encoder scales and designing compatible attention mechanisms for RWKV. This forward-looking approach opens doors for further refinement and exploration of RWKV's role, potentially facilitating its evolution as a staple model in medical image segmentation.

In conclusion, Med-URWKV positions itself as a capable alternative within the medical imaging domain, presenting a credible solution that addresses deficiencies in previous methods. By effectively utilizing pre-trained models, Med-URWKV sets a precedent for future advancements and explorations in employing RWKV for medical image segmentation tasks. The release of the model code also promises to buttress further development and validation efforts within the research community.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos