Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Query Prompting for Multi-Domain Landmark Detection (2404.01194v1)

Published 1 Apr 2024 in cs.CV

Abstract: Medical landmark detection is crucial in various medical imaging modalities and procedures. Although deep learning-based methods have achieve promising performance, they are mostly designed for specific anatomical regions or tasks. In this work, we propose a universal model for multi-domain landmark detection by leveraging transformer architecture and developing a prompting component, named as Adaptive Query Prompting (AQP). Instead of embedding additional modules in the backbone network, we design a separate module to generate prompts that can be effectively extended to any other transformer network. In our proposed AQP, prompts are learnable parameters maintained in a memory space called prompt pool. The central idea is to keep the backbone frozen and then optimize prompts to instruct the model inference process. Furthermore, we employ a lightweight decoder to decode landmarks from the extracted features, namely Light-MLD. Thanks to the lightweight nature of the decoder and AQP, we can handle multiple datasets by sharing the backbone encoder and then only perform partial parameter tuning without incurring much additional cost. It has the potential to be extended to more landmark detection tasks. We conduct experiments on three widely used X-ray datasets for different medical landmark detection tasks. Our proposed Light-MLD coupled with AQP achieves SOTA performance on many metrics even without the use of elaborate structural designs or complex frameworks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. X. Chen, B. Lei, C.-M. Pun, and S. Wang, “Brain diffuser: An end-to-end brain image to brain network pipeline,” in PRCV, 2023, pp. 16–26.
  2. C. Gong, C. Jing, X. Chen, C.-M. Pun, G. Huang, A. Saha, M. Nieuwoudt, H.-X. Li, Y. Hu, and S. Wang, “Generative ai for brain image computing and brain network computing: a review,” Frontiers in Neuroscience, vol. 17, 2023.
  3. G. Huang, X. Chen, Y. Shen, and S. Wang, “Mr image super-resolution using wavelet diffusion for predicting alzheimer’s disease,” in BI, 2023.
  4. J. Ma, Y. He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,” Nature Communications, vol. 15, no. 1, p. 654, 2024.
  5. H. Zhu, Q. Yao, L. Xiao, and S. K. Zhou, “You only learn once: Universal anatomical landmark detection,” in MICCAI.   Springer, 2021, pp. 85–95.
  6. H. Zhu, Q. Yao, and S. K. Zhou, “Datr: domain-adaptive transformer for multi-domain landmark detection,” arXiv, 2022.
  7. W. Liu, X. Shen, C.-M. Pun, and X. Cun, “Explicit visual prompting for low-level structure segmentations,” in CVPR, 2023, pp. 19 434–19 445.
  8. Z. Li, X. Chen, S. Wang, and C.-M. Pun, “A large-scale film style dataset for learning multi-frequency driven film enhancement,” in IJCAI, 2023, pp. 1160–1168.
  9. Z. Li, X. Chen, C.-M. Pun, and X. Cun, “High-resolution document shadow removal via a large-scale real-world dataset and a frequency-aware shadow erasing net,” ICCV, pp. 12 415–12 424, 2023.
  10. S. Luo, X. Chen, W. Chen, Z. Li, S. Wang, and C.-M. Pun, “Devignet: High-resolution vignetting removal via a dual aggregated fusion transformer with adaptive channel expansion,” in AAAI, 2023.
  11. W. Liu, X. Cun, C.-M. Pun, M. Xia, Y. Zhang, and J. Wang, “Coordfill: Efficient high-resolution image inpainting via parameterized coordinate querying,” in AAAI, 2023.
  12. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv, 2020.
  13. Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “Vitpose: Simple vision transformer baselines for human pose estimation,” NIPS, vol. 35, pp. 38 571–38 584, 2022.
  14. X. Chen, X. Cun, C.-M. Pun, and S. Wang, “Shadocnet: Learning spatial-aware tokens in transformer for document shadow removal,” ICASSP, pp. 1–5, 2022.
  15. Z. Li, X. Chen, C.-M. Pun, and S. Wang, “Wavenhancer: Unifying wavelet and transformer for image enhancement,” arXiv, 2022.
  16. W. Liu, X. Shen, H. Li, X. Bi, B. Liu, C.-M. Pun, and X. Cun, “Depth-aware test-time training for zero-shot video object segmentation,” arXiv, 2024.
  17. Q. Xiao, Y. Chen, J. Wang, F. Zang, Y. Wang, G. Zheng, K. Yang, R. Zhang, B. Hu, and H. Chen, “Application of tvd-net for sagittal alignment and instability measurements in cervical spine radiographs,” Medical Physics, 2023.
  18. K. Li, S. Wang, X. Zhang, Y. Xu, W. Xu, and Z. Tu, “Pose recognition with cascade transformers,” in CVPR, 2021, pp. 1944–1953.
  19. P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–35, 2023.
  20. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” NIPS, vol. 33, pp. 1877–1901, 2020.
  21. M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” in ECCV.   Springer, 2022, pp. 709–727.
  22. M. Sandler, A. Zhmoginov, M. Vladymyrov, and A. Jackson, “Fine-tuning image transformers using learnable memory,” in CVPR, 2022, pp. 12 155–12 164.
  23. X. Chen, C.-M. Pun, and S. Wang, “Medprompt: Cross-modal prompting for multi-task medical image translation,” ArXiv, 2023.
  24. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  25. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” NIPS, vol. 30, 2017.
  26. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in CVPR, 2022, pp. 16 000–16 009.
  27. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV.   Springer, 2014, pp. 740–755.
  28. J. Wu, H. Zheng, B. Zhao, Y. Li, B. Yan, R. Liang, W. Wang, S. Zhou, G. Lin, Y. Fu et al., “Large-scale datasets for going deeper in image understanding,” in ICME.   IEEE, 2019, pp. 1480–1485.
  29. D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv, 2016.
  30. C.-W. Wang, C.-T. Huang, J.-H. Lee, C.-H. Li, S.-W. Chang, M.-J. Siao, T.-M. Lai, B. Ibragimov, T. Vrtovec, O. Ronneberger et al., “A benchmark for comparison of dental radiography analysis algorithms,” Medical image analysis, vol. 31, pp. 63–76, 2016.
  31. C. Payer, D. Štern, H. Bischof, and M. Urschler, “Integrating spatial configuration into heatmap regression based cnns for landmark localization,” Medical image analysis, vol. 54, pp. 207–219, 2019.
  32. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in ICML.   pmlr, 2015, pp. 448–456.
  33. D. Štern, T. Ebner, and M. Urschler, “From local to global random regression forests: exploring anatomical landmark localization,” in MICCAI.   Springer, 2016, pp. 221–229.
  34. C. Lindner, P. A. Bromiley, M. C. Ionita, and T. F. Cootes, “Robust and accurate shape model matching using random forest regression-voting,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1862–1874, 2014.
  35. M. Urschler, T. Ebner, and D. Štern, “Integrating geometric configuration and appearance information into a unified framework for anatomical landmark localization,” Medical image analysis, vol. 43, pp. 23–36, 2018.
  36. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI.   Springer, 2015, pp. 234–241.
Citations (2)

Summary

We haven't generated a summary for this paper yet.