Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scientific citations in Wikipedia (0705.2106v1)

Published 15 May 2007 in cs.DL and cs.IR

Abstract: The Internet-based encyclopaedia Wikipedia has grown to become one of the most visited web-sites on the Internet. However, critics have questioned the quality of entries, and an empirical study has shown Wikipedia to contain errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the "Wikipedia risks". The present work describes a simple assessment of these aspects by examining the outbound links from Wikipedia articles to articles in scientific journals with a comparison against journal statistics from Journal Citation Reports such as impact factors. The results show an increasing use of structured citation markup and good agreement with the citation pattern seen in the scientific literature though with a slight tendency to cite articles in high-impact journals such as Nature and Science. These results increase confidence in Wikipedia as an good information organizer for science in general.

Citations (139)

Summary

  • The paper demonstrates that Wikipedia citation patterns closely align with total scientific citations from journal reports, affirming its credibility.
  • The study employs Perl-based regular expression matching to reveal over-citation of high-impact journals and field-specific biases.
  • The findings highlight that structured citation templates and open access significantly influence citation frequency, informing future quality-assessment tools.

Scientific Citations in Wikipedia

Overview

Finn Arup Nielsen's paper, "Scientific citations in Wikipedia," offers an empirical analysis of the quality and consistency of scientific citations present in Wikipedia. This paper is rooted in the context of Wikipedia's increasing prominence as a global information resource and the ongoing scrutiny regarding the reliability of its content. The paper employs a quantitative approach to evaluate outbound links from Wikipedia articles to scientific journal articles, comparing these links against bibliometric statistics derived from the Journal Citation Reports (JCR).

Methodology

The research utilizes regular expression matching programs scripted in the Perl language to extract journal titles from the cite journal templates embedded in Wikipedia pages. This data extraction process was based on an XML dump file of the English Wikipedia database obtained on April 2, 2007. The paper compiled citation counts for individual journals and compared these figures with several metrics from the JCR 2005, including total citation counts, impact factors, and the number of articles.

Results

The paper identified 30,368 outbound citations from the cite journal templates. Leading the list were prominent journals such as Nature (787 citations), Science (669 citations), and New England Journal of Medicine (446 citations). Among astronomy journals, Astrophysical Journal (424 citations) and Astronomy & Astrophysics (154 citations) were notably cited. Medical journals such as The Lancet (268 citations) and JAMA (217 citations) also featured prominently.

A key finding in the correlational analysis showed high agreement between Wikipedia citation patterns and the JCR's total citation counts for journals. Notably, the correlation was weaker for the JCR impact factors and the number of articles per journal. The strongest correlations were obtained by multiplying the total number of citations by the impact factor, indicating that Wikipedia authors might overcite high-impact journals compared to the overall scientific literature.

Discussion

The research highlights several implications:

  1. Reliability of Citations: The strong correlation between Wikipedia citations and total citations in the JCR suggests that Wikipedia can be a credible information organizer, especially for science-related content.
  2. Field Biases: Astronomy journals received disproportionately high citations, with a notable effort in Australian botany, as illustrated by journals like Nuytsia (101 citations). Conversely, internet-related journals received fewer citations, contradicting the notion that Wikipedia would show substantial bias towards topics favored by the "Internet-savvy" demographic.
  3. Access to Free Articles: Freely accessible journals like the BMJ appeared to gain more citations on Wikipedia, likely due to the open-access nature of their articles.
  4. Quality Assessment: The methodology proposed, which includes structured citation markup and citation template utilization, facilitates the assessment of Wikipedia article quality based on outbound citation patterns.

Future Directions

The paper indicates several avenues for further research and development:

  1. Enhanced Citation Tools: The incorporation of reference management tools like Zotero, which supports Wikipedia citation handling, suggests that the structure and number of scientific citations in Wikipedia will continue to grow. Future research could explore how these tools contribute to citation accuracy and ease of reference.
  2. Automated Quality Metrics: Developing algorithms that more accurately correlate Wikipedia citation data with traditional bibliometric measures could enhance the ability to assess and ensure the quality of Wikipedia articles.
  3. Field-Specific Studies: More granular studies on specific scientific fields and their citation patterns within Wikipedia could provide deeper insights into how certain disciplines are represented and referenced.

Conclusion

Nielsen's investigation into scientific citations on Wikipedia offers substantial quantitative evidence that enhances confidence in Wikipedia's role as an information resource for scientific content. While it reveals certain biases and areas of overcitation, the overall alignment with established scientific citation patterns supports the use of Wikipedia for background reading and information organization. The paper underscores the importance of structured citation practices, which will likely benefit future researchers seeking well-organized and credible references.