LLM behavior under author data disputes in taxonomy papers

Determine how the large language models integrated in the ARETE package (e.g., GPT-3.5 and GPT-4o accessed via the OpenAI API) behave when processing taxonomy papers that contain author data disputes, specifically with respect to correctly extracting species occurrence information including species names, localities, and in-text coordinates.

Background

ARETE is an R package that automates extraction of species occurrence data from unstructured text using LLMs. The authors validated performance primarily on papers in the RECODE corpus focusing on insects and spiders.

They note that certain real-world complexities present in taxonomy literature were absent from their training and evaluation data. In particular, the authors explicitly state they did not encounter cases of author data dispute and therefore do not know how current LLMs will behave in such scenarios. Establishing model behavior when texts contain conflicting claims is important for reliable extraction and downstream conservation analyses.

References

For example, we did not see any cases of author data dispute in our training data. We do not know how existing LLM will behave in this scenario.

ARETE: an R package for Automated REtrieval from TExt with large language models  (2511.04573 - Branco et al., 6 Nov 2025) in Supporting Information S.5.2.1, Resource limited validation