Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

XAIQA: Explainer-Based Data Augmentation for Extractive Question Answering (2312.03567v1)

Published 6 Dec 2023 in cs.CL

Abstract: Extractive question answering (QA) systems can enable physicians and researchers to query medical records, a foundational capability for designing clinical studies and understanding patient medical history. However, building these systems typically requires expert-annotated QA pairs. LLMs, which can perform extractive QA, depend on high quality data in their prompts, specialized for the application domain. We introduce a novel approach, XAIQA, for generating synthetic QA pairs at scale from data naturally available in electronic health records. Our method uses the idea of a classification model explainer to generate questions and answers about medical concepts corresponding to medical codes. In an expert evaluation with two physicians, our method identifies $2.2\times$ more semantic matches and $3.8\times$ more clinical abbreviations than two popular approaches that use sentence transformers to create QA pairs. In an ML evaluation, adding our QA pairs improves performance of GPT-4 as an extractive QA model, including on difficult questions. In both the expert and ML evaluations, we examine trade-offs between our method and sentence transformers for QA pair generation depending on question difficulty.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Joel Stremmel (6 papers)
  2. Ardavan Saeedi (15 papers)
  3. Hamid Hassanzadeh (3 papers)
  4. Sanjit Batra (1 paper)
  5. Jeffrey Hertzberg (2 papers)
  6. Jaime Murillo (2 papers)
  7. Eran Halperin (8 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.