PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis
Abstract: The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, LLMs have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layers for vision-language alignment and diagnosis in a dialogic manner. Yet, this approach often requires optimization of extensive learnable parameters in the text branch and the dialogue head, potentially diminishing the LLMs' efficacy, especially with limited training data. In our work, we innovate by eliminating the text branch and substituting the dialogue head with a classification head. This approach presents a more effective method for harnessing LLMs in diagnosis with fewer learnable parameters. Furthermore, to balance the retention of detailed image information with progression towards accurate diagnosis, we introduce the contextual multi-token engine. This engine is specialized in adaptively generating diagnostic tokens. Additionally, we propose the information emitter module, which unidirectionally emits information from image tokens to diagnosis tokens. Comprehensive experiments validate the superiority of our methods and the effectiveness of proposed modules. Our codes can be found at https://github.com/CodeMonsterPHD/PneumoLLM/tree/main.
- Additive angular margin for few shot learning to classify clinical endoscopy images, in: Proceedings of the International Workshop on Machine Learning in Medical Imaging, Springer. pp. 494–503.
- Language models are few-shot learners. Advances in Neural Information Processing Systems 33, 1877–1901.
- Dynamic feature splicing for few-shot rare disease diagnosis. Medical Image Analysis 90, 102959.
- Orthogonal latent space learning with feature weighting and graph learning for multimodal alzheimer’s disease diagnosis. Medical Image Analysis 84, 102698.
- Sam on medical images: A comprehensive study on three prompt modes. arXiv preprint arXiv:2305.00035 .
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 .
- Imagenet: A large-scale hierarchical image database, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE. pp. 248–255.
- Detection and visualisation of pneumoconiosis using an ensemble of multi-dimensional deep features learned from chest x-rays. International Journal of Environmental Research and Public Health 19, 11193.
- Automated detection of pneumoconiosis with multilevel deep features learned from chest x-ray radiographs. Computers in Biology and Medicine 129, 104125.
- Use data augmentation for a deep learning classification model with chest x-ray clinical imaging featuring coal workers’ pneumoconiosis. BMC Pulmonary Medicine 22, 1–14.
- An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations, pp. 1–12.
- One-vote veto: Semi-supervised learning for low-shot glaucoma diagnosis. IEEE Transactions on Medical Imaging .
- Chexmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images. arXiv preprint arXiv:2307.03293 .
- Covid-vit: Classification of covid-19 from ct chest images based on vision transformer models. arXiv preprint arXiv:2107.01682 .
- Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
- Cae-transformer: Transformer-based model to predict invasiveness of lung adenocarcinoma subsolid nodules from non-thin section 3d ct scans. arXiv preprint arXiv:2110.08721 .
- Parameter-efficient transfer learning for nlp, in: International Conference on Machine Learning, PMLR. pp. 2790–2799.
- Lora: Low-rank adaptation of large language models, in: International Conference on Learning Representations, pp. 1–16.
- Association of circadian rhythm with mild cognitive impairment among male pneumoconiosis workers in hong kong: a cross-sectional study. Scientific Reports 13, 1650.
- A novel image-to-knowledge inference approach for automatically diagnosing tumors. Expert Systems with Applications 229, 120450.
- Transformer-based factorized encoder for classification of pneumoconiosis on 3d ct images. Computers in Biology and Medicine 150, 106137.
- A visual–language foundation model for pathology image analysis using medical twitter. Nature Medicine , 1–10.
- Visual-attribute prompt learning for progressive mild cognitive impairment prediction, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 547–557.
- Thyroid nodule segmentation and classification in ultrasound images through intra-and inter-task consistent learning. Medical Image Analysis 79, 102443.
- Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186.
- Segment anything. arXiv preprint arXiv:2304.02643 .
- The power of scale for parameter-efficient prompt tuning, in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059.
- Hybrid supervision learning for pathology whole slide image classification, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 309–318.
- BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models, in: International Conference on Machine Learning, PMLR. pp. 1–13.
- Self-supervised anomaly detection, staging and segmentation for retinal images. Medical Image Analysis 87, 102805.
- The potential diagnostic biomarkers for the igg subclass in coal workers’ pneumoconiosis. Journal of Immunology Research 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022.
- A convnet for the 2020s, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11976–11986.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 .
- Cheap and quick: Efficient vision-language instruction tuning for large language models. arXiv preprint arXiv:2305.15023 .
- Scpm-net: An anchor-free 3d lung nodule detection network using sphere representation and center points matching, in: Medical Image Analysis, Elsevier. p. 102287.
- A multi-graph cross-attention based region-aware feature fusion network using multi-template for brain disorder diagnosis. IEEE Transactions on Medical Imaging .
- Visualizing data using t-sne. Journal of Machine Learning Research 9.
- Foundation models for generalist medical artificial intelligence. Nature 616, 259–265.
- OpenAI, 2023a. Chatgpt. https://chat.openai.com. Accessed: [November 7th, 2023].
- OpenAI, 2023b. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 .
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 .
- Video-based ai for beat-to-beat assessment of cardiac function. Nature 580, 252–256.
- Conformer: Local features coupling global representations for visual recognition, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 367–376.
- Pneumoconiosis: current status and future prospects. Chinese Medical Journal 134, 898–907.
- Generalized pancreatic cancer diagnosis via multiple instance learning and anatomically-guided shape normalization. Medical Image Analysis 86, 102774.
- Learning transferable visual models from natural language supervision, in: International Conference on Machine Learning, PMLR. pp. 8748–8763.
- Large language models encode clinical knowledge. Nature 620, 172–180.
- Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33, 3008–3021.
- Expertnet: Defeat noisy labels by deep expert consultation paradigm for pneumoconiosis staging on chest radiographs. Expert Systems with Applications , 120710.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 .
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 .
- Attention is all you need. Advances in Neural Information Processing Systems 30.
- A real-world dataset and benchmark for foundation model adaptation in medical image classification. Scientific Data , 1–9.
- Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Scientific Reports 10, 19549.
- Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106.
- Potential of deep learning in assessing pneumoconiosis depicted on digital chest radiography. Occupational and Environmental Medicine 77, 597–602.
- Medclip: Contrastive learning from unpaired medical images and text, in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3876–3887.
- Improving compositional text-to-image generation with large vision-language models. arXiv preprint arXiv:2310.06311 .
- Towards generalist foundation model for radiology. arXiv preprint arXiv:2308.02463 .
- Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500.
- Gradient modulated contrastive distillation of low-rank multi-modal knowledge for disease diagnosis. Medical Image Analysis , 102874.
- Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097 .
- A clinically applicable ai system for diagnosis of congenital heart diseases based on computed tomography images. Medical Image Analysis 90, 102953.
- Towards general purpose medical ai: Continual learning medical foundation model. arXiv preprint arXiv:2303.06580 .
- Cxr-clip: Toward large scale chest x-ray language-image pre-training, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 101–111.
- Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 .
- On the challenges and perspectives of foundation models for medical image analysis. arXiv preprint arXiv:2306.05705 .
- Text-guided foundation model adaptation for pathological image classification, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 272–282.
- Recognize anything: A strong image tagging model. arXiv preprint arXiv:2306.03514 .
- An improved cnn-based pneumoconiosis diagnosis method on x-ray chest film, in: International Conference on Human Centered Computing, Springer. pp. 647–658.
- Conditional prompt learning for vision-language models, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 16816–16825.
- Learning to prompt for vision-language models. International Journal of Computer Vision 130, 2337–2348.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.