Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An open dataset of article processing charges from six large scholarly publishers (2019-2023) (2406.08356v1)

Published 12 Jun 2024 in cs.DL

Abstract: This paper introduces a dataset of article processing charges (APCs) produced from the price lists of six large scholarly publishers - Elsevier, Frontiers, PLOS, MDPI, Springer Nature and Wiley - between 2019 and 2023. APC price lists were downloaded from publisher websites each year as well as via Wayback Machine snapshots to retrieve fees per journal per year. The dataset includes journal metadata, APC collection method, and annual APC price list information in several currencies (USD, EUR, GBP, CHF, JPY, CAD) for 8,712 unique journals and 36,618 journal-year combinations. The dataset was generated to allow for more precise analysis of APCs and can support library collection development and scientometric analysis estimating APCs paid in gold and hybrid OA journals.

Citations (2)

Summary

  • The paper introduces a comprehensive dataset of APC fees collected from six major scholarly publishers over 2019-2023.
  • It employs manual and automated methods to standardize data and convert multiple currencies, ensuring high accuracy.
  • The dataset delivers practical insights for libraries and researchers to analyze pricing trends and inform OA publishing negotiations.

An Open Dataset of Article Processing Charges from Six Large Scholarly Publishers (2019-2023)

The article titled "An open dataset of article processing charges from six large scholarly publishers (2019-2023)" by Butler et al. provides a comprehensive dataset of article processing charges (APCs) extracted from publisher price lists spanning five years, from 2019 to 2023. This work represents a significant contribution to the understanding of the scholarly publishing market, particularly within the domain of Open Access (OA) fees, and offers practical implications for library collection development and scientometric analysis.

Dataset Overview

The dataset was compiled by extracting APC prices from six major scholarly publishers: Elsevier, Frontiers, MDPI, PLOS, Springer Nature, and Wiley. These APC prices were collected and recorded through a combination of manual and automated methods from various sources, including publisher websites, individual journal pages, and the Wayback Machine. The dataset encompasses 8,712 unique journals and 36,618 journal-year combinations, providing annual prices in multiple currencies (USD, EUR, GBP, CHF, JPY, CAD).

Methodology and Data Cleaning

The data collection involved several meticulous steps to ensure accuracy and consistency:

  • Source and Format Standardization: Data were sourced from downloadable PDFs, structured XLSX files, or HTML content on publisher websites.
  • Unique Identifiers: Each journal was assigned an internal unique identifier (ID) to account for multiple ISSNs and spelling variations.
  • Currency Conversions: APCs were provided in multiple currencies and converted to USD where necessary using annual average rates from ofx.com.
  • Data Cleaning: This included checking for and correcting discrepancies such as ISSN misattributions and variations in journal titles.

Key Findings and Statistical Summaries

Descriptive Statistics

The authors presented descriptive statistics, indicating variations across publishers and years:

  • Journal and Price Statistics: There was a total of 8,712 unique journals with 6,643 in 2019, growing to 7,985 by 2023.
  • APCs Distribution: APCs ranged significantly, with minimum fees as low as $150 and maximum fees reaching$11,690. Gold OA average APCs were $1,977, while hybrid OA average APCs were significantly higher at$3,137.

Publisher Comparisons

The dataset highlights significant differences in APCs among the six publishers:

  • Most Journals: Elsevier, Springer Nature, and Wiley had the largest portfolios which included many hybrid OA options.
  • Gold OA Focus: Frontiers, MDPI, and PLOS, all of which are fully OA, exhibited lower average APCs with MDPI showing the lowest fees.

Temporal Trends

Annual trends showcased a general increase in APCs over time, particularly in hybrid journals:

  • APCs Increment: Journals from MDPI showed significant increment trends, while Wiley noted a decrease in gold OA fees in recent years, likely due to its acquisition of Hindawi journals.
  • Inflation Analysis: The majority of journals increased APCs from 2019 to 2023, with a proportion exceeding the 19% inflation over this period.

Practical and Theoretical Implications

From a practical standpoint, the dataset serves as a vital tool for libraries in managing collections and budgeting for OA expenditures. It allows for precise analysis of APC trends and can aid in negotiating read-and-publish agreements.

On a theoretical level, this dataset supports further scientometric studies by providing empirical evidence to understand APC dynamics and the economic landscape of scholarly publishing. Future updates to the dataset aim to fill gaps in APC and OA status data, expanding coverage to additional publishers and extending temporal coverage.

Conclusion

The dataset by Butler et al. fills a critical need for consolidated, machine-readable APC information across major OA publishers. It offers a versatile resource for both applied and theoretical inquiries into the economics of scholarly publishing. The authors aim to continually refine this dataset, addressing missing data and expanding its scope to enhance its utility for the broader research community.

By making this dataset openly available, Butler et al. have laid the groundwork for comprehensive analyses of OA economics, offering a transparent tool that benefits libraries, researchers, and policy-makers alike in understanding and managing the costs associated with scholarly publications.