Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian

Published 12 Apr 2024 in cs.CL | (2404.08488v1)

Abstract: This paper proposes a test to perform Thematic Analysis (TA) with LLM on data which is in a different language than English. While there has been initial promising work on using pre-trained LLMs for TA on data in English, we lack any tests on whether these models can reasonably perform the same analysis with good quality in other language. In this paper a test will be proposed using an open access dataset of semi-structured interviews in Italian. The test shows that a pre-trained model can perform such a TA on the data, also using prompts in Italian. A comparative test shows the model capacity to produce themes which have a good resemblance with those produced independently by human researchers. The main implication of this study is that pre-trained LLMs may thus be suitable to support analysis in multilingual situations, so long as the language is supported by the model used.