The use of percentiles and percentile rank classes in the analysis of bibliometric data: Opportunities and limits (1211.0381v1)

Published 2 Nov 2012 in cs.DL and stat.AP

Abstract: Percentiles have been established in bibliometrics as an important alternative to mean-based indicators for obtaining a normalized citation impact of publications. Percentiles have a number of advantages over standard bibliometric indicators used frequently: for example, their calculation is not based on the arithmetic mean which should not be used for skewed bibliometric data. This study describes the opportunities and limits and the advantages and disadvantages of using percentiles in bibliometrics. We also address problems in the calculation of percentiles and percentile rank classes for which there is not (yet) a satisfactory solution. It will be hard to compare the results of different percentile-based studies with each other unless it is clear that the studies were done with the same choices for percentile calculation and rank assignment.

Citations (179)

View on Semantic Scholar

Summary

The paper proposes using percentiles and percentile rank classes (PRs) as robust alternatives to mean-based indicators for analyzing skewed bibliometric citation data.
Percentiles are calculated using methods like Hazen's or Gringorten's and grouped into PR schemes (e.g., PR(2,10)) to enhance the interpretability of citation impact.
Adopting percentile-based analysis can lead to more equitable research assessments, but challenges remain with citation tie-ins, necessitating further standardization of methods.

The Use of Percentiles and Percentile Rank Classes in the Analysis of Bibliometric Data: Opportunities and Limits

This paper explores the application of percentiles and percentile rank classes (PRs) to bibliometric data, with a focus on normalizing citation impacts as an alternative to mean-based indicators. Notably, percentiles offer an advantage for skewed bibliometric data distributions, mitigating the influence of outliers compared to arithmetic means.

Reference Sets and Normalization

The authors highlight the utility of reference sets in bibliometric analysis, which allow for normalization across various factors influencing citation impact, including subject categories, publication year, and document type. It is noted that traditional arithmetic mean-based indicators like the Mean Observed Citation Rate (MOCR) and the Mean Expected Citation Rate (MECR) can be skewed by highly cited publications. The paper critiques these methods and suggests that nonparametric statistics such as percentiles could overcome the bias inherent in mean calculations.

Calculation and Application of Percentiles

Percentiles are proposed as a robust measure because they are less affected by extreme values and do not assume a normative data distribution. The paper thoroughly discusses various methodologies for calculating percentiles, including Hazen's compromise and Gringorten's method. These calculations are foundational for categorizing publications into meaningful citation impact classes.

The paper illustrates percentile application through visual representations, such as violin and box plots, which demonstrate citation impact across different universities. These plots underscore disparities in institutional citation performance, highlighting how percentile-based rankings reveal more nuanced distinctions than mean-based metrics.

Assigning Percentiles to Rank Classes

The assignment of percentiles into percentile rank classes is explored as an approach to enhance interpretability of citation metrics. The paper outlines common PR schemes, such as PR(2, 10) and PR(6), which categorize publications based on citation percentiles. Each scheme serves as a convention for evaluating citation impact differences across institutions or researchers.

However, the paper points out challenges, such as the uncertainty in assigning publications with tied citation counts to discrete classes. Several proposed solutions, such as fractional assignments, are analyzed for their practicality and implications on data interpretation.

Implications and Future Directions

The application of percentiles presents significant implications for bibliometric research, offering a more precise tool for incomparably evaluating scholarly impact across various contexts. Theoretical implications include a shift away from mean-based measures towards more distribution-sensitive analyses. Practically, this can refine institutional rankings and research assessments, potentially influencing funding and resource allocation.

Despite its promise, the paper acknowledges unresolved issues with percentile calculations and PR assignments, particularly in cases of citation tie-ins. Further research is encouraged to establish standard methodologies and address these nuances, which could enhance the comparability of percentile-based studies.

In conclusion, this paper demonstrates the potential of percentiles and PRs as robust, alternative bibliometric indicators. Their adoption could lead to more equitable and accurate assessments of research impact, provided standardization of calculation methods and assignment criteria are established through future investigations.