Papers
Topics
Authors
Recent
Search
2000 character limit reached

Citation Preference Ratio (CPR)

Updated 12 December 2025
  • Citation Preference Ratio (CPR) is a metric that quantifies if a group's papers receive citations above or below expectations based on their proportion of the literature.
  • It employs a uniform random model to compute expected citations, thereby normalizing for group size and mitigating limitations of raw citation counts.
  • The CPR and its pairwise variant (CSI) provide robust insights into citation dynamics by highlighting preferential attachment and insularity in academic publishing.

The Citation Preference Ratio (CPR) is a citation-based metric designed to quantify the extent to which papers belonging to a particular group (such as a funding class, research field, or institution type) are cited more or less frequently than expected given their share of the literature. CPR is formulated as a deviation ratio between the observed citation count to a group and the expected count calculated under a uniform random model, correcting for baseline paper volume effects. CPR has been applied to measure preferential citation patterns, insularity, and influence dynamics among funding classes in AI research and as a pairwise journal comparison tool (under the synonymous "Citation Success Index") for evaluating citation capacity as a probability metric. This metric distinguishes citation preference and isolation from mere magnitude-based measures and refines the interpretation of citation metrics beyond traditional impact factors.

1. Formal Definition

The CPR for a group ff quantifies whether papers in that class receive more citations than the null expectation derived from their proportion Nf/NN_f/N in the total paper population. Mathematically, given the set of all classes FF, let Cfi→fjC^{f_i \to f_j} denote citations from fif_i to fjf_j, and NfN_f the number of papers in ff. The CPR is then:

CPR(f)=C(f) E(f) \mathrm{CPR}(f) = \frac{C(f)}{\,E(f)\,}

where

C(f)=∑fi∈FC fi→fC(f) = \sum_{f_i \in F} C^{\,f_i \rightarrow f}

is the observed number of citations to ff from all other classes (including self-citation), and

E(f)=T×NfNE(f) = T \times \frac{N_f}{N}

with

T=∑fi∈F∑fj∈FC fi→fjT = \sum_{f_i \in F}\sum_{f_j \in F} C^{\,f_i \to f_j}

the total number of citations among all AI papers. CPR thus represents the over- or under-citation tendency relative to the volume baseline, as detailed in (Gnewuch et al., 5 Dec 2025).

2. Conceptual and Historical Rationale

CPR was introduced to address specific limitations of pure count-based and impact-factor-based metrics. In citation analysis, absolute metrics such as mean citation rates, median citations, or group-indexed h-index values can be strongly confounded by group size, outlier effects, and heavy-tailed citation distributions. CPR refines this by normalizing expected citations to group size and is robust against the dominance of large aggregates or "blockbuster" papers. Its pairwise variant, known as the Citation Success Index (CSI) or CPR, further enables probabilistic comparison between two groups or journals, giving the probability that a random paper from group AA out-cites one from BB (Milojević et al., 2016).

3. Analytical Framework and Computation

Calculation proceeds in two core modes, depending on context:

  • Group-Level Application (e.g., Funding Classes): Aggregate all observed citations to and from each group, tabulate the population counts, and evaluate CPR using the above formula.
  • Pairwise Journal Comparison (CSI): Let XAX_A and XBX_B be citation counts for random papers from journals AA and BB with empirical distributions pA(c)p_A(c) and pB(c)p_B(c). The CSI between AA and BB is:

    CSI(A,B)=P[XA>XB]+12 P[XA=XB]\mathrm{CSI}(A,B) = P[X_A > X_B] + \frac{1}{2} \, P[X_A = X_B]

    or equivalently,

    CSI(A,B)=∑c=0∞[PA(>c)+12pA(c)]pB(c)\mathrm{CSI}(A,B) = \sum_{c=0}^\infty [P_A(>c) + \frac{1}{2} p_A(c)] p_B(c)

    where PA(>c)=∑m>cpA(m)P_A(>c) = \sum_{m>c} p_A(m), numerically equivalent to the Mann–Whitney UU-statistic normalized to the total number of paper pairs, with ties as half-wins (Milojević et al., 2016).

For rapid estimation, CPR/CSI can be approximated using group or journal impact factors (IFs):

  • For journals AA and BB (IF ratio x=IFA/IFBx = \mathrm{IF}_A / \mathrm{IF}_B), let f0f_0 be the fraction of zero-citation papers in BB. The approximate logistic form is:

    CSI(A,B)≈11+x−1.23\mathrm{CSI}(A,B) \approx \frac{1}{1 + x^{-1.23}}

    for x≥1x \geq 1 and f0≪1f_0 \ll 1, with general form described in (Milojević et al., 2016).

4. Interpretation and Empirical Results

CPR provides a direct answer to the distributional preference question: is a class or journal cited preferentially relative to its size? In funding-class analysis within AI literature (Gnewuch et al., 5 Dec 2025), CPR values above 1 indicate citation preference or "insularity," 1 neutrality, and below 1 relative neglect. Key results (2018–2022):

Year Industry-funded CPR Non-industry-funded CPR Non-funded CPR
2018 ~0.9 >1 <1
2020 ≈1 ≈1 ≈1
2021 ~1.1 <1 <1
2022 ~1.05 — ~0.8

Analysis demonstrates that, post-2021, industry-funded AI papers became cited more than warranted by their ~10% share of papers, evidencing an increasing citation preference and sectoral insularity (Gnewuch et al., 5 Dec 2025).

For journal comparison (CSI), a twofold IF advantage corresponds to a CSI of ~0.70; overwhelming advantage (>90%>90\% out-citation probability) only arises for IF ratios exceeding 5–6, showing CSI to be less sensitive to sporadic high-impact papers than arithmetic impact factors (Milojević et al., 2016).

5. Robustness to Citation Outliers and Distributional Biases

CPR/CSI is designed to address skewness and outlier effects omnipresent in citation distributions. Unlike metrics such as mean citation rates or impact factor, which are susceptible to sporadic blockbuster papers, CPR/CSI aggregates over the entirety of the underlying citation histogram. This insensitivity ensures robustness: single extreme-tail papers minimally affect the final value, as CPR/CSI operates over the shape of the empirical distribution and not its average (Milojević et al., 2016).

In cross-group citation studies, CPR does not account for within-group topic popularity or recency bias, focusing strictly on proportional citation allocation arising from baseline group volume.

6. Assumptions, Limitations, and Contextual Considerations

Several caveats underpin CPR analysis:

  • Funding-class CPR relies on precise metadata. Undercounting occurs when authors omit funding sources, leading to potential misclassification (Gnewuch et al., 5 Dec 2025).
  • Citation counts are used as proxies for influence, but citation context (e.g., perfunctory versus foundational) is not distinguished.
  • Paper count normalization overlooks non-volume correlates of citation accrual, such as topical salience and recency.
  • In CSI applications, referencing only IFs may ignore notable distributional anomalies, though empirical evidence suggests IF ratio suffices for accurate estimation given similar distribution shapes across journals (Milojević et al., 2016).

7. Comparative Role Among Citation Metrics

CPR complements established metrics (absolute citation counts, h-index) by introducing a normalization for group or class size and challenging interpretations based solely on magnitude. Metrics such as mean/median citation or the h-index measure overall impact, often confounded by size and tail effects. CPR interrogates relative citation preference, crucial for analysis of preferential attachment, field insularity, or funding impact.

In journal comparison, CPR/CSI provides a transparent, probability-based alternative to the impact factor, extending the assessment beyond blunt magnitude-based ranking to distributional citation probability (Gnewuch et al., 5 Dec 2025, Milojević et al., 2016). This refinement is essential for nuanced, ecosystem-level citation dynamics and policy evaluation in academic publishing and research funding.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Citation Preference Ratio (CPR).