The Unreasonable Effectiveness of Open Science in AI: A Replication Study

Published 20 Dec 2024 in cs.AI, cs.LG, and cs.SE | (2412.17859v1)

Abstract: A reproducibility crisis has been reported in science, but the extent to which it affects AI research is not yet fully understood. Therefore, we performed a systematic replication study including 30 highly cited AI studies relying on original materials when available. In the end, eight articles were rejected because they required access to data or hardware that was practically impossible to acquire as part of the project. Six articles were successfully reproduced, while five were partially reproduced. In total, 50% of the articles included was reproduced to some extent. The availability of code and data correlate strongly with reproducibility, as 86% of articles that shared code and data were fully or partly reproduced, while this was true for 33% of articles that shared only data. The quality of the data documentation correlates with successful replication. Poorly documented or miss-specified data will probably result in unsuccessful replication. Surprisingly, the quality of the code documentation does not correlate with successful replication. Whether the code is poorly documented, partially missing, or not versioned is not important for successful replication, as long as the code is shared. This study emphasizes the effectiveness of open science and the importance of properly documenting data work.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that sharing both code and data enhances reproducibility, with studies offering both showing an 86% success rate.
It systematically replicated 22 out of 30 highly cited AI studies, using a four-tier classification to evaluate reproducibility.
The research underscores the pivotal role of open science in promoting transparency and calls for mandatory data and code sharing in AI.

The Unreasonable Effectiveness of Open Science in AI: A Replication Study

The paper "The Unreasonable Effectiveness of Open Science in AI: A Replication Study" seeks to address the reproducibility crisis often reported across various scientific domains, with a specific focus on AI. The authors, Gundersen, Cappelen, Mølna, and Nilsen, conducted a systematic replication study to evaluate the extent of reproducibility within AI research. They analyzed 30 highly cited AI studies, ultimately replicating a portion of these studies using provided resources, and reported significant findings regarding the availability of code and data.

Study Design and Key Findings

The study utilized a structured approach, where the selected studies were chosen from the most cited empirical AI articles over several years. The reproducibility study was classified into four types based on available resources: R1 Description, R2 Code, R3 Data, and R4 Experiment. The authors primarily focused on R3 and R4 studies, which denote the availability of datasets and both datasets and code respectively.

Out of the 30 articles assessed, eight were excluded due to practical limitations, such as unavailability of special hardware or proprietary data. The remaining 22 studies were subjected to attempts of replication. Notably, 50% of these studies were reproduced either fully or partially. The analysis revealed that the availability of both code and data significantly enhances reproducibility; 86% of the studies sharing both were reproduced successfully compared to only 33% of those sharing solely data.

The study identifies several obstacles to reproducibility, emphasizing the importance of comprehensive data documentation. A critical observation was the strong correlation between well-documented data and successful replication, whereas the quality of code documentation did not have a similar effect. As long as the core code was available, the specifics of documentation seemed less impactful on replication success.

Implications and Future Outlook

The authors highlight the critical role of open science practices—specifically, the sharing of code and data—in improving the reproducibility of AI studies. This aligns with a growing consensus in the scientific community advocating for transparency and accessibility in research materials. The study underscores a potential shift in publication standards where open data and code might become prerequisites for academic publishing.

This research suggests actionable insights for improving reproducibility in AI. It aligns with the objectives of major AI conferences which increasingly encourage or require the submission of both datasets and code. Mandating open science practices could facilitate verifiable research and foster innovation by enabling others to build upon existing work.

Furthermore, the study's results carry significant implications for the development of AI models, including LLMs, where transparency and access to training data are essential for verification and further advancements.

Conclusion

In summary, this paper provides a well-reasoned analysis of the reproducibility crisis in AI research, offering empirical evidence on the crucial impact of data and code availability. By emphasizing the importance of open science, the study advocates for improvements in research practices that could enhance the reliability and development of future AI technologies. The replication study serves as a testament to the potential benefits of adopting such methodologies, reinforcing the notion that effective documentation and open resources are paramount for scientific progress in AI.

Markdown