Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures (2410.05235v2)

Published 7 Oct 2024 in cs.CL and cs.AI

Abstract: Explaining AI decisions is a major challenge nowadays in AI, in particular when applied to sensitive scenarios like medicine and law. However, the need to explain the rationale behind decisions is a main issue also for human-based deliberation as it is important to justify \textit{why} a certain decision has been taken. Resident medical doctors for instance are required not only to provide a (possibly correct) diagnosis, but also to explain how they reached a certain conclusion. Developing new tools to aid residents to train their explanation skills is therefore a central objective of AI in education. In this paper, we follow this direction, and we present, to the best of our knowledge, the first multilingual dataset for Medical Question Answering where correct and incorrect diagnoses for a clinical case are enriched with a natural language explanation written by doctors. These explanations have been manually annotated with argument components (i.e., premise, claim) and argument relations (i.e., attack, support), resulting in the Multilingual CasiMedicos-Arg dataset which consists of 558 clinical cases in four languages (English, Spanish, French, Italian) with explanations, where we annotated 5021 claims, 2313 premises, 2431 support relations, and 1106 attack relations. We conclude by showing how competitive baselines perform over this challenging dataset for the argument mining task.

Overview of "CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures"

The paper "CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures" presents an innovative dataset in the field of medical question answering (QA). This dataset is designed to enhance AI's ability to explain medical decisions through argumentative structures, which are essential in disciplines where decisions must be justified comprehensively, such as in medicine. By integrating multilingual and annotated argumentative data, CasiMedicos-Arg aims to fill existing gaps in medical question answering benchmarks, especially in providing detailed explanations for correct and incorrect clinical predictions.

Key Contributions

CasiMedicos-Arg introduces several novel features that distinguish it from existing datasets in the medical QA landscape:

  1. Multilingual Capability: The dataset consists of 558 clinical cases annotated in four languages — English, Spanish, French, and Italian. This multilingual aspect allows for broader research applicability and evaluation of language-specific LLMs.
  2. Argumentative Annotations: Each clinical case is enriched with argumentative components such as claims and premises, and relations like support and attack. This detailed annotation is crucial for research in argument mining and for developing models that can generate argumentative explanations for medical diagnoses.
  3. Integration of Correct and Incorrect Options: Unlike existing datasets, CasiMedicos-Arg includes both correct and incorrect clinical options, annotated with gold explanations. These explanations are manually written by medical professionals, providing a reliable benchmark for both evaluation and training of AI systems.
  4. Strong Baselines for Argument Mining: The dataset has been used to establish competitive baselines in argument component detection. Using various LLMs, the authors demonstrate the validity and utility of their annotations.

Numerical Results and Analysis

The dataset includes over 5,021 claims, 2,313 premises, 2,431 support relations, and 1,106 attack relations. Experiments using multilingual LLMs, such as mBERT, mDeBERTa, and med-mT5, showed substantial improvements in argument detection tasks when incorporating multilingual data transfer approaches compared to monolingual training.

The best performance in detecting argument components came from the Mistral model, showing that domain-specific fine-tuning on medical text can greatly enhance model performance. However, the dataset revealed challenges when transferring models cross-lingually, emphasizing the importance of multilingual data.

Implications and Future Directions

The creation and annotation of CasiMedicos-Arg represent a significant step forward in the quest to enable AI systems in healthcare to provide transparent and justifiable reasoning in their decision processes. Through this dataset, researchers can aim to improve the interpretability and accuracy of LLMs in medical settings.

Future research directions may include:

  • Further refining argument generation capabilities by leveraging the detailed argumentative annotations provided by this dataset.
  • Exploring the integration of argumentative explanations in real-world clinical decision support systems.
  • Extending the dataset to include additional languages or considering different varieties of medical narratives.

In conclusion, the CasiMedicos-Arg dataset not only offers a robust foundation for research in medical AI but also exemplifies the potential of AI to enhance clarity and trust in medical decision-making processes through well-structured explanatory frameworks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Anar Yeginbergen (4 papers)
  2. Ainara Estarrona (3 papers)
  3. Elena Cabrio (11 papers)
  4. Serena Villata (12 papers)
  5. Rodrigo Agerri (41 papers)
  6. Ekaterina Sviridova (2 papers)
Citations (1)
Youtube Logo Streamline Icon: https://streamlinehq.com