Citation Preference Ratio (CPR)
- Citation Preference Ratio (CPR) is a metric that quantifies if a group's papers receive citations above or below expectations based on their proportion of the literature.
- It employs a uniform random model to compute expected citations, thereby normalizing for group size and mitigating limitations of raw citation counts.
- The CPR and its pairwise variant (CSI) provide robust insights into citation dynamics by highlighting preferential attachment and insularity in academic publishing.
The Citation Preference Ratio (CPR) is a citation-based metric designed to quantify the extent to which papers belonging to a particular group (such as a funding class, research field, or institution type) are cited more or less frequently than expected given their share of the literature. CPR is formulated as a deviation ratio between the observed citation count to a group and the expected count calculated under a uniform random model, correcting for baseline paper volume effects. CPR has been applied to measure preferential citation patterns, insularity, and influence dynamics among funding classes in AI research and as a pairwise journal comparison tool (under the synonymous "Citation Success Index") for evaluating citation capacity as a probability metric. This metric distinguishes citation preference and isolation from mere magnitude-based measures and refines the interpretation of citation metrics beyond traditional impact factors.
1. Formal Definition
The CPR for a group quantifies whether papers in that class receive more citations than the null expectation derived from their proportion in the total paper population. Mathematically, given the set of all classes , let denote citations from to , and the number of papers in . The CPR is then:
where
is the observed number of citations to from all other classes (including self-citation), and
with
the total number of citations among all AI papers. CPR thus represents the over- or under-citation tendency relative to the volume baseline, as detailed in (Gnewuch et al., 5 Dec 2025).
2. Conceptual and Historical Rationale
CPR was introduced to address specific limitations of pure count-based and impact-factor-based metrics. In citation analysis, absolute metrics such as mean citation rates, median citations, or group-indexed h-index values can be strongly confounded by group size, outlier effects, and heavy-tailed citation distributions. CPR refines this by normalizing expected citations to group size and is robust against the dominance of large aggregates or "blockbuster" papers. Its pairwise variant, known as the Citation Success Index (CSI) or CPR, further enables probabilistic comparison between two groups or journals, giving the probability that a random paper from group out-cites one from (Milojević et al., 2016).
3. Analytical Framework and Computation
Calculation proceeds in two core modes, depending on context:
- Group-Level Application (e.g., Funding Classes): Aggregate all observed citations to and from each group, tabulate the population counts, and evaluate CPR using the above formula.
- Pairwise Journal Comparison (CSI): Let and be citation counts for random papers from journals and with empirical distributions and . The CSI between and is:
or equivalently,
where , numerically equivalent to the Mann–Whitney -statistic normalized to the total number of paper pairs, with ties as half-wins (Milojević et al., 2016).
For rapid estimation, CPR/CSI can be approximated using group or journal impact factors (IFs):
- For journals and (IF ratio ), let be the fraction of zero-citation papers in . The approximate logistic form is:
for and , with general form described in (Milojević et al., 2016).
4. Interpretation and Empirical Results
CPR provides a direct answer to the distributional preference question: is a class or journal cited preferentially relative to its size? In funding-class analysis within AI literature (Gnewuch et al., 5 Dec 2025), CPR values above 1 indicate citation preference or "insularity," 1 neutrality, and below 1 relative neglect. Key results (2018–2022):
| Year | Industry-funded CPR | Non-industry-funded CPR | Non-funded CPR |
|---|---|---|---|
| 2018 | ~0.9 | >1 | <1 |
| 2020 | ≈1 | ≈1 | ≈1 |
| 2021 | ~1.1 | <1 | <1 |
| 2022 | ~1.05 | — | ~0.8 |
Analysis demonstrates that, post-2021, industry-funded AI papers became cited more than warranted by their ~10% share of papers, evidencing an increasing citation preference and sectoral insularity (Gnewuch et al., 5 Dec 2025).
For journal comparison (CSI), a twofold IF advantage corresponds to a CSI of ~0.70; overwhelming advantage ( out-citation probability) only arises for IF ratios exceeding 5–6, showing CSI to be less sensitive to sporadic high-impact papers than arithmetic impact factors (Milojević et al., 2016).
5. Robustness to Citation Outliers and Distributional Biases
CPR/CSI is designed to address skewness and outlier effects omnipresent in citation distributions. Unlike metrics such as mean citation rates or impact factor, which are susceptible to sporadic blockbuster papers, CPR/CSI aggregates over the entirety of the underlying citation histogram. This insensitivity ensures robustness: single extreme-tail papers minimally affect the final value, as CPR/CSI operates over the shape of the empirical distribution and not its average (Milojević et al., 2016).
In cross-group citation studies, CPR does not account for within-group topic popularity or recency bias, focusing strictly on proportional citation allocation arising from baseline group volume.
6. Assumptions, Limitations, and Contextual Considerations
Several caveats underpin CPR analysis:
- Funding-class CPR relies on precise metadata. Undercounting occurs when authors omit funding sources, leading to potential misclassification (Gnewuch et al., 5 Dec 2025).
- Citation counts are used as proxies for influence, but citation context (e.g., perfunctory versus foundational) is not distinguished.
- Paper count normalization overlooks non-volume correlates of citation accrual, such as topical salience and recency.
- In CSI applications, referencing only IFs may ignore notable distributional anomalies, though empirical evidence suggests IF ratio suffices for accurate estimation given similar distribution shapes across journals (Milojević et al., 2016).
7. Comparative Role Among Citation Metrics
CPR complements established metrics (absolute citation counts, h-index) by introducing a normalization for group or class size and challenging interpretations based solely on magnitude. Metrics such as mean/median citation or the h-index measure overall impact, often confounded by size and tail effects. CPR interrogates relative citation preference, crucial for analysis of preferential attachment, field insularity, or funding impact.
In journal comparison, CPR/CSI provides a transparent, probability-based alternative to the impact factor, extending the assessment beyond blunt magnitude-based ranking to distributional citation probability (Gnewuch et al., 5 Dec 2025, Milojević et al., 2016). This refinement is essential for nuanced, ecosystem-level citation dynamics and policy evaluation in academic publishing and research funding.