LLM behavior under author data disputes in taxonomy papers
Determine how the large language models integrated in the ARETE package (e.g., GPT-3.5 and GPT-4o accessed via the OpenAI API) behave when processing taxonomy papers that contain author data disputes, specifically with respect to correctly extracting species occurrence information including species names, localities, and in-text coordinates.
References
For example, we did not see any cases of author data dispute in our training data. We do not know how existing LLM will behave in this scenario.
— ARETE: an R package for Automated REtrieval from TExt with large language models
(2511.04573 - Branco et al., 6 Nov 2025) in Supporting Information S.5.2.1, Resource limited validation