Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improved EATFormer: A Vision Transformer for Medical Image Classification (2403.13167v1)

Published 19 Mar 2024 in cs.CV

Abstract: The accurate analysis of medical images is vital for diagnosing and predicting medical conditions. Traditional approaches relying on radiologists and clinicians suffer from inconsistencies and missed diagnoses. Computer-aided diagnosis systems can assist in achieving early, accurate, and efficient diagnoses. This paper presents an improved Evolutionary Algorithm-based Transformer architecture for medical image classification using Vision Transformers. The proposed EATFormer architecture combines the strengths of Convolutional Neural Networks and Vision Transformers, leveraging their ability to identify patterns in data and adapt to specific characteristics. The architecture incorporates novel components, including the Enhanced EA-based Transformer block with Feed-Forward Network, Global and Local Interaction , and Multi-Scale Region Aggregation modules. It also introduces the Modulated Deformable MSA module for dynamic modeling of irregular locations. The paper discusses the Vision Transformer (ViT) model's key features, such as patch-based processing, positional context incorporation, and Multi-Head Attention mechanism. It introduces the Multi-Scale Region Aggregation module, which aggregates information from different receptive fields to provide an inductive bias. The Global and Local Interaction module enhances the MSA-based global module by introducing a local path for extracting discriminative local information. Experimental results on the Chest X-ray and Kvasir datasets demonstrate that the proposed EATFormer significantly improves prediction speed and accuracy compared to baseline models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. M. B. Wallace, P. Sharma, P. Bhandari, J. East, G. Antonelli, R. Lorenzetti, M. Vieth, I. Speranza, M. Spadaccini, M. Desai et al., “Impact of artificial intelligence on miss rate of colorectal neoplasia,” Gastroenterology, vol. 163, no. 1, pp. 295–304, 2022.
  2. J. Zhang, X. Zou, L.-D. Kuang, J. Wang, R. S. Sherratt, and X. Yu, “Cctsdb 2021: a more comprehensive traffic sign detection benchmark,” Human-centric Computing and Information Sciences, vol. 12, 2022.
  3. O. N. Manzari, A. Boudesh, and S. B. Shokouhi, “Pyramid transformer for traffic sign detection,” in 2022 12th International Conference on Computer and Knowledge Engineering (ICCKE).   IEEE, 2022, pp. 112–116.
  4. O. N. Manzari, J. M. Kaleybar, H. Saadat, and S. Maleki, “Befunet: A hybrid cnn-transformer architecture for precise medical image segmentation,” arXiv preprint arXiv:2402.08793, 2024.
  5. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International conference on machine learning.   PMLR, 2021, pp. 10 347–10 357.
  6. O. N. Manzari, H. Ahmadabadi, H. Kashiani, S. B. Shokouhi, and A. Ayatollahi, “Medvit: A robust vision transformer for generalized medical image classification,” Computers in Biology and Medicine, vol. 157, p. 106791, 2023.
  7. Y. Li, K. Zhang, J. Cao, R. Timofte, and L. Van Gool, “Localvit: Bringing locality to vision transformers,” arXiv preprint arXiv:2104.05707, 2021.
  8. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.
  9. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
  10. M. Rahimzadeh and A. Attar, “A modified deep convolutional neural network for detecting covid-19 and pneumonia from chest x-ray images based on the concatenation of xception and resnet50v2,” Informatics in medicine unlocked, vol. 19, p. 100360, 2020.
  11. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  12. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  13. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 6105–6114.
  14. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
  15. J. P. Cohen, P. Morrison, and L. Dao, “Covid-19 image data collection,” arXiv preprint arXiv:2003.11597, 2020.
  16. K. Pogorelov, K. R. Randel, C. Griwodz, S. L. Eskeland, T. de Lange, D. Johansen, C. Spampinato, D.-T. Dang-Nguyen, M. Lux, P. T. Schmidt et al., “Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection,” in Proceedings of the 8th ACM on Multimedia Systems Conference, 2017, pp. 164–169.
  17. W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 568–578.
  18. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  19. X. Li, L. Wang, Q. Jiang, and N. Li, “Differential evolution algorithm with multi-population cooperation and multi-strategy integration,” Neurocomputing, vol. 421, pp. 285–302, 2021.
  20. J. Zhang, X. Li, Y. Wang, C. Wang, Y. Yang, Y. Liu, and D. Tao, “Eatformer: improving vision transformer inspired by evolutionary algorithm,” arXiv preprint arXiv:2206.09325, 2022.
  21. X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9308–9316.
  22. K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser et al., “Rethinking attention with performers,” arXiv preprint arXiv:2009.14794, 2020.
  23. Z. Xia, X. Pan, S. Song, L. E. Li, and G. Huang, “Vision transformer with deformable attention,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4794–4803.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yulong Shisu (2 papers)
  2. Susano Mingwin (2 papers)
  3. Yongshuai Wanwag (2 papers)
  4. Zengqiang Chenso (1 paper)
  5. Sunshin Huing (2 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com