An Evaluation of the h-index and Its Inconsistencies
The paper by Ludo Waltman and Nees Jan van Eck presents a critical examination of the h-index, a widely used bibliometric indicator for evaluating scientific impact. Originating from Hirsch's work in 2005, the h-index has enjoyed significant popularity for its simplicity and conceptual appeal. However, Waltman and van Eck argue that the h-index does not consistently reflect the scientific impact it purports to measure, thus questioning its validity as a comprehensive ranking tool.
Critique of the h-index
The authors argue that the h-index aggregates publication and citation metrics into a single number in a way that results in inconsistent rankings. By illustrating the behavior of the h-index in various hypothetical scenarios, Waltman and van Eck demonstrate how the indicator can lead to counterintuitive outcomes. The three examples presented in the paper effectively illustrate this inconsistency:
- Negative effects of relative performance: When two scientists achieve the same relative improvement in citations, the h-index can unintentionally reverse their relative rankings due to the specific thresholds involved.
- Issues with absolute performance improvements: Similarly, identical absolute improvements in publication and citation metrics can also lead to rank reversal.
- Inconsistencies across aggregation levels: The h-index fails to ensure consistency when used to compare individual scientists as well as aggregated research groups, as demonstrated in the paper's third example.
These findings highlight the inadequacy of the h-index in delivering consistent assessments, particularly when measuring the overall impact of a set of publications.
Theoretical Insights and Alternative Indicators
Keen to address the problems associated with the h-index, the authors engage with recent theoretical advancements. Central to this discussion is the notion of scoring rules, as outlined by Marchant and further developed by the authors. Scoring rules entail assigning scores to individual publications based on their citation count and then aggregating these scores, typically through summation. Crucially, scoring rules, unlike the h-index, maintain consistency in rankings.
Among the alternatives discussed, the highly cited publications indicator stands out. This indicator, akin to the h-index in terms of robustness against extreme values, avoids the latter's inconsistency pitfalls by focusing solely on publications surpassing a particular citation threshold. The authors recognize that while this indicator involves some arbitrariness regarding the threshold choice, this is no more arbitrary than the implicit parameters within the h-index.
Future Directions and Implications
The paper implies significant ramifications for the field of bibliometrics and beyond. Evaluating a scientist's or research group's impact requires indicators that provide internally consistent measurements. By transitioning to scoring rule-based metrics, decision-makers can ensure more reliable evaluations and comparisons. Moreover, the discussion encourages the exploration of concave scoring functions, potentially adding nuanced control over the weight of high and low citation counts.
For future research, the field would benefit from empirical studies that compare various scoring rules within different contexts, extending beyond theoretical examination. This comparative approach can bolster practical guidance on selecting the most appropriate metrics for specific evaluation scenarios.
In summary, Waltman and van Eck's insightful analysis presents a compelling case for reevaluating the use of the h-index in bibliometrics. Their argument frames the move towards theoretically sound and empirically justified alternatives as both necessary and consequential for the advancement of scientific assessment methodologies.