Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare (2405.13030v1)

Published 16 May 2024 in cs.CL and cs.AI

Abstract: LLMs have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19 percent compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.

References (40)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare (2405.13030v1)

Summary

Follow-up Questions

Authors (8)

Tweets

Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare (2405.13030v1)

Summary

Follow-up Questions

Related Papers

Authors (8)

Tweets