Computational Reproducibility in Computational Social Science

Published 4 Jul 2023 in cs.CY | (2307.01918v4)

Abstract: Replication crises have shaken the scientific landscape during the last decade. As potential solutions, open science practices were heavily discussed and have been implemented with varying success in different disciplines. We argue that computational-x disciplines such as computational social science, are also susceptible for the symptoms of the crises, but in terms of reproducibility. We expand the binary definition of reproducibility into a tier system which allows increasing levels of reproducibility based on external verfiability to counteract the practice of open-washing. We provide solutions for barriers in Computational Social Science that hinder researchers from obtaining the highest level of reproducibility, including the use of alternate data sources and considering reproducibility proactively.

Abstract PDF HTML Upgrade to Chat

References (42)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a tiered system categorizing computational reproducibility levels to provide clearer verification standards and counteract open-washing.
The paper details practical measures such as employing Docker virtualization, open-source software, and data donations to mitigate economic and access barriers.
The paper highlights challenges from external dependencies, including APIs and proprietary tools, which complicate maintaining reproducible research outcomes.

Introduction

The paper "Computational Reproducibility in Computational Social Science" delineates the challenges associated with ensuring reproducibility in computational social science (CSS). It critiques the binary conception of computational reproducibility and introduces a nuanced tier system that categorizes different levels of reproducibility based on external verifiability. This classification is proposed in response to the practices termed as "open-washing," where research may deceptively appear reproducible. Through this tier system, the paper addresses barriers inhibiting the achievement of high reproducibility standards, proposing both practical and precautionary measures to ameliorate these obstacles.

Computational Reproducibility: Definitions and Challenges

Reproducibility in computational contexts is differentiated from replicability, where the former hinges on the availability of original data and the original computational environment. Traditional definitions may fail to encapsulate the dimensions critical to computational reproducibility: the agent capable of performing the reproducibility check and the computational environment used. The paper proposes a first-order computational reproducibility (1° CR) that allows verification by both original investigators and third parties under specified conditions.

Tier System of Computational Reproducibility

The paper introduces a tiered system that categorizes reproducibility into three orders. First-order (1° CR) assumes verification by the authors or designated third parties with access to nonrestrictive materials. Second-order (2° CR) involves verification by trusted third-party agents, often necessary in cases dealing with sensitive data. Third-order (3° CR) extends to a general reproducibility standard accessible to all under suitable conditions. This tier system is a response to circumvent the ambiguity and vague assurances around reproducibility in scholarly publications.

Figure 1: A declarative description of a computational environment (Dockerfile) generated by rang.

Barriers to Reproducibility: External Dependencies and Opacity

The paper identifies external dependencies, particularly on APIs and proprietary systems, as significant barriers to reproducibility. Dependencies on opaque systems like the Twitter API or ChatGPT pose challenges due to their evolving nature and lack of transparency. The sudden change in accessibility, such as Twitter API's policy alterations, often leaves research irreproducible. Additionally, reliance on proprietary software amplifies the economic burdens for researchers endeavoring reproducibility. The material access restrictions transform into significant computational obstacles.

Practical Solutions and Recommendations

To counteract the barriers, the paper suggests alternatives like data donations and industry-research collaborations to ensure free data access. It recommends employing open-source software as alternatives to proprietary systems to alleviate economic constraints on reproducibility. Archiving data outputs and using educational initiatives to improve coding practices for reproducibility is emphasized. Embracing virtualization systems, such as Docker, is proposed to encapsulate computational environments as a comprehensive reproducibility strategy (Figure 2).

Figure 2: Recommended practices to achieve the maximum degree of computational reproducibility in different scenarios.

Conclusion

The research articulates that ensuring reproducibility in CSS necessitates a fundamental rethinking of data sharing and computational practices. By adopting the proposed tier system, researchers can more accurately communicate their reproducibility potential while addressing open-washing tendencies. The paper advocates for incentivizing proactive reproducibility strategies, such as standardized sharing of computational resources, as research practice norms. These structural changes are essential to foster a reproducibility-centric scientific culture that aligns with the proposed procedural clarifications.

Markdown Report Issue