Dice Question Streamline Icon: https://streamlinehq.com

Determine whether journals manipulate metadata to improve indexing

Determine whether journals intentionally manipulate metadata deposited to Crossref—such as selecting English as a primary language, front‑loading English abstracts, or omitting language attributes—to improve indexing outcomes and discoverability across downstream platforms.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper identifies widespread gaps and inconsistencies in language-related metadata within Crossref records, including the near absence of article-level language attributes (95.7% missing) and frequent discrepancies between declared and detected languages. It also notes that the Crossref REST API exposes only the first listed abstract, potentially incentivizing journals to lead with English abstracts for indexing purposes.

Because these issues complicate reliable language detection and reporting, the authors explicitly raise whether journals might be gaming metadata to improve indexing, highlighting a need to empirically establish the presence and extent of such practices.

References

This, in turn, leads to a long series of open questions: Are journals gaming metadata to improve indexing?

Evaluating Multilingual Metadata Quality in Crossref (2503.11853 - II et al., 14 Mar 2025) in Discussion, paragraph beginning “These problems have a compounding effect on one another”