- The paper introduces a novel methodology that classifies individual publications using direct citation relationships to form hierarchical research areas.
- The approach produces three classification levels from broad disciplines to specific subfields, enabling detailed analysis of multidisciplinary trends.
- The study highlights limitations of relying solely on direct citations and proposes integrating additional relational data to enhance accuracy.
Publication-Level Classification System for Science
Waltman and van Eck present a novel methodology for constructing a publication-level classification system that addresses limitations in prevalent journal-level systems, such as those used by Web of Science and Scopus. Traditional systems classify journals into research areas, which can lead to inadequate detail and challenges with multidisciplinary journals. This paper discusses a new approach that classifies individual publications based on citation relationships, enabling greater granularity and flexibility.
Methodology Overview
The methodology involves three primary steps:
- Determining Relatedness: Publications are initially assessed for relatedness through direct citations, leading to a binary matrix of citation relationships. This approach simplifies computational demands but is limited by the exclusion of co-citations or bibliographic coupling.
- Cluster Formation: Using hierarchical clustering, publications are organized into research areas. Each cluster forms a research area at a specific granularity, from broad disciplines down to specific subfields. The parameters such as level resolution and minimum publications per area guide this process.
- Labeling Research Areas: Labels are generated from terms extracted from publication titles and abstracts. These terms help characterize each research area, though refinement is needed, particularly at higher aggregation levels.
Application and Results
The methodology was applied to a dataset encompassing ten million publications spanning 2001 to 2010. The classification structure includes three levels: broad disciplines, fields, and subfields. At the highest level, 20 areas were identified with a substantial overlap with traditional scientific disciplines. Notably, classifications revealed areas without clear correspondence to established disciplines, reflecting the evolving nature of scientific inquiry.
The second level houses 672 research areas, visualized using bibliometric mapping to uncover relationships and potential hot spots, such as graphene research in physics. The third level offers a finer resolution with over 22,000 areas, though the classification accuracy occasionally suffers due to reliance solely on direct citation data.
Limitations and Future Directions
While the method boasts transparency and modest resource requirements, its exclusive reliance on direct citations is a limitation. Many publications remain unclassified due to insufficient citation connections. Future work could incorporate additional relational data, such as shared bibliographic references or semantic similarity, to improve coverage and accuracy.
Assessing publication relatedness through bibliographic coupling or content analysis could mitigate issues of misclassification, particularly for multidisciplinary or sparsely connected publications. Moreover, better labeling techniques, perhaps integrating expert judgment or journal title terms, could enhance clarity and usability.
The approach holds promise not only for publication-level classification but could inform more refined journal-level systems. As scientific disciplines continue to evolve, such adaptable methodologies are crucial for capturing the complexity and interwoven nature of modern research landscapes.
This paper's methodology contributes to the field by pushing beyond traditional journal-level limitations, offering a scalable and detailed classification alternative. Future research, particularly with more comprehensive relational metrics, could further refine its applicability and accuracy in the dynamic world of scientific literature.