Dice Question Streamline Icon: https://streamlinehq.com

Unexamined quality of OpenAlex language detection

Ascertain the quality of OpenAlex’s language detection for classifying the language of scholarly works from titles and abstracts by conducting a systematic evaluation and quantifying error rates to understand its impact on language-based bibliometric analyses.

Information Square Streamline Icon: https://streamlinehq.com

Background

OpenAlex infers the language of works algorithmically using titles and abstracts, whereas Scopus relies on collected metadata. The paper notes discrepancies in language counts between databases and highlights that the accuracy of OpenAlex’s language detection has not been assessed in published research.

A formal evaluation of the language detection approach is necessary to gauge its reliability, identify potential biases (e.g., multilingual abstracts), and determine its suitability for comparative and field-normalized analyses.

References

Both databases use different approaches to determining the language of a given work, and no published work to date has examined the quality of the language detection in OpenAlex.

An analysis of the suitability of OpenAlex for bibliometric analyses (2404.17663 - Alperin et al., 26 Apr 2024) in Section 3.3 (Languages analysis)