Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? (1511.05078v2)

Published 16 Nov 2015 in cs.DL

Abstract: In 1965, Derek de Solla Price foresaw the day when a citation-based taxonomy of science and technology would be delineated and correspondingly used for science policy. A taxonomy needs to be comprehensive and accurate if it is to be useful for policy making, especially now that policy makers are utilizing citation-based indicators to evaluate people, institutions and laboratories. Determining the accuracy of a taxonomy, however, remains a challenge. Previous work on the accuracy of partition solutions is sparse, and the results of those studies, while useful, have not been definitive. In this study we compare the accuracies of topic-level taxonomies based on the clustering of documents using direct citation, bibliographic coupling, and co-citation. Using a set of new gold standards - articles with at least 100 references - we find that direct citation is better at concentrating references than either bibliographic coupling or co-citation. Using the assumption that higher concentrations of references denote more accurate clusters, direct citation thus provides a more accurate representation of the taxonomy of scientific and technical knowledge than either bibliographic coupling or co-citation. We also find that discipline-level taxonomies based on journal schema are highly inaccurate compared to topic-level taxonomies, and recommend against their use.

Citations (283)

Summary

  • The paper reveals that direct citation is the superior method for clustering scientific documents by capturing historical context and ensuring broad taxonomy coverage.
  • It employs a gold standard of synthesis papers with over 100 references to rigorously compare direct citation, bibliographic coupling, and co-citation techniques.
  • The findings imply that replacing journal-based taxonomies with document-level methods can improve research evaluations and inform funding decisions.

Citation Analysis for Accurate Knowledge Taxonomy in Science and Technology

The paper by Klavans and Boyack embarks on a rigorous evaluation of citation analysis methods to determine which technique most accurately generates a taxonomy of scientific and technical knowledge. The paper primarily assesses three citation-based methods: direct citation (DC), bibliographic coupling (BC), and co-citation (CC), and compares their efficacy at creating an accurate representation at both document-level and discipline-level.

Key Findings and Methodology

The researchers introduced a novel gold standard for accuracy assessment—articles with at least 100 references—on the premise that these documents serve as comprehensive syntheses of their respective topics. An assumption is made that enhanced concentration of references within clusters indicates higher accuracy.

  • The paper finds direct citation to be superior in accurately clustering scientific documents, attributed to its ability to encapsulate historical references more comprehensively than BC or CC. This method also offers the highest taxonomy coverage, incorporating both indexed and cited non-indexed documents.
  • Bibliographic coupling showed partial efficacy with a higher accuracy than co-citation, but it draws upon more contemporary reference pools, which could obfuscate historical linkages.
  • Co-citation demonstrated the least accuracy, largely due to its reliance on indirect associations, reflecting a less stable picture of scientific taxonomies over time.

The authors undertook an extensive comparison of these citation methods, applying a Herfindahl index to measure reference concentration across 37,207 synthesis papers, which were used as gold standards. The results overwhelmingly supported the effectiveness of direct citation for accurate representation at the topic level.

Implications and Recommendations

The research critically evaluates the prevalent use of journal-based taxonomies and reveals their inadequacies for rigorous scientific analysis, proposing a transition towards document-level taxonomies. Specifically, it challenges the suitability of journals as proxies for knowledge areas and implies that research assessments and funding decisions, often influenced by journal metrics, may benefit from adopting direct citation-based taxonomies.

A significant implication of this research is its potential impact on bibliometric evaluations. The newfound evidence encourages the adoption of historical and socio-cognitive perspectives in generating taxonomies, thereby reducing distortions in research fund allocations. The authors suggest that this approach could support better identification of emerging scientific areas, thus providing a more dynamic understanding of the research landscape.

Future Developments

Klavans and Boyack suggest that future endeavors in bibliometrics should integrate their findings to refine methods of detecting and fostering innovation. By emphasizing the stability and comprehensive coverage of direct citation, further research could focus on refining taxonomies to reflect accurately the evolution of scientific domains, incorporating diverse forms of scholarly outputs, including patents and web-based information, potentially encompassing over 100 million documents.

This paper sets a precedent for adopting more nuanced approaches in constructing taxonomies of scientific knowledge. It encourages scholars and policymakers to reconsider traditional strategies and embrace methodologies that accurately reflect the intricate structures of academic and technical domains, ultimately guiding more informed decision-making processes in the scientific community.