Objectively Evaluating the Reliability of Cell Type Annotation Using LLM-Based Strategies (2409.15678v1)

Published 24 Sep 2024 in q-bio.QM and q-bio.GN

Abstract: Reliability in cell type annotation is challenging in single-cell RNA-sequencing data analysis because both expert-driven and automated methods can be biased or constrained by their training data, especially for novel or rare cell types. Although LLMs are useful, our evaluation found that only a few matched expert annotations due to biased data sources and inflexible training inputs. To overcome these limitations, we developed the LICT (LLM-based Identifier for Cell Types) software package using a multi-model fusion and "talk-to-machine" strategy. Tested across various single-cell RNA sequencing datasets, our approach significantly improved annotation reliability, especially in datasets with low cellular heterogeneity. Notably, we established objective criteria to assess annotation reliability using the "talk-to-machine" approach, which addresses discrepancies between our annotations and expert ones, enabling reliable evaluation even without reference data. This strategy enhances annotation credibility and sets the stage for advancing future LLM-based cell type annotation methods.