Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness (2404.09206v1)

Published 14 Apr 2024 in cs.CL

Abstract: Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained LLMs. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original LLMs. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yuqi Wang (62 papers)
  2. Zeqiang Wang (9 papers)
  3. Wei Wang (1793 papers)
  4. Qi Chen (194 papers)
  5. Kaizhu Huang (95 papers)
  6. Anh Nguyen (157 papers)
  7. Suparna De (11 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.