Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain (2310.14151v1)

Published 22 Oct 2023 in cs.CL

Abstract: Biomedical language understanding benchmarks are the driving forces for artificial intelligence applications with LLM back-ends. However, most current benchmarks: (a) are limited to English which makes it challenging to replicate many of the successes in English for other languages, or (b) focus on knowledge probing of LLMs and neglect to evaluate how LLMs apply these knowledge to perform on a wide range of bio-medical tasks, or (c) have become a publicly available corpus and are leaked to LLMs during pre-training. To facilitate the research in medical LLMs, we re-build the Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark into a large scale prompt-tuning benchmark, PromptCBLUE. Our benchmark is a suitable test-bed and an online platform for evaluating Chinese LLMs' multi-task capabilities on a wide range bio-medical tasks including medical entity recognition, medical text classification, medical natural language inference, medical dialogue understanding and medical content/dialogue generation. To establish evaluation on these tasks, we have experimented and report the results with the current 9 Chinese LLMs fine-tuned with differtent fine-tuning techniques.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Wei Zhu (290 papers)
  2. Xiaoling Wang (42 papers)
  3. Huanran Zheng (6 papers)
  4. Mosha Chen (17 papers)
  5. Buzhou Tang (18 papers)
Citations (27)