Citation Preference Ratio (CPR)

Updated 12 December 2025

Citation Preference Ratio (CPR) is a metric that quantifies if a group's papers receive citations above or below expectations based on their proportion of the literature.
It employs a uniform random model to compute expected citations, thereby normalizing for group size and mitigating limitations of raw citation counts.
The CPR and its pairwise variant (CSI) provide robust insights into citation dynamics by highlighting preferential attachment and insularity in academic publishing.

The Citation Preference Ratio (CPR) is a citation-based metric designed to quantify the extent to which papers belonging to a particular group (such as a funding class, research field, or institution type) are cited more or less frequently than expected given their share of the literature. CPR is formulated as a deviation ratio between the observed citation count to a group and the expected count calculated under a uniform random model, correcting for baseline paper volume effects. CPR has been applied to measure preferential citation patterns, insularity, and influence dynamics among funding classes in AI research and as a pairwise journal comparison tool (under the synonymous "Citation Success Index") for evaluating citation capacity as a probability metric. This metric distinguishes citation preference and isolation from mere magnitude-based measures and refines the interpretation of citation metrics beyond traditional impact factors.

1. Formal Definition

The CPR for a group $f$ quantifies whether papers in that class receive more citations than the null expectation derived from their proportion $N_f/N$ in the total paper population. Mathematically, given the set of all classes $F$ , let $C^{f_i \to f_j}$ denote citations from $f_i$ to $f_j$ , and $N_f$ the number of papers in $f$ . The CPR is then:

$\mathrm{CPR}(f) = \frac{C(f)}{\,E(f)\,}$

where

$C(f) = \sum_{f_i \in F} C^{\,f_i \rightarrow f}$

is the observed number of citations to $f$ from all other classes (including self-citation), and

$E(f) = T \times \frac{N_f}{N}$

with

$T = \sum_{f_i \in F}\sum_{f_j \in F} C^{\,f_i \to f_j}$

the total number of citations among all AI papers. CPR thus represents the over- or under-citation tendency relative to the volume baseline, as detailed in (Gnewuch et al., 5 Dec 2025).

2. Conceptual and Historical Rationale

CPR was introduced to address specific limitations of pure count-based and impact-factor-based metrics. In citation analysis, absolute metrics such as mean citation rates, median citations, or group-indexed h-index values can be strongly confounded by group size, outlier effects, and heavy-tailed citation distributions. CPR refines this by normalizing expected citations to group size and is robust against the dominance of large aggregates or "blockbuster" papers. Its pairwise variant, known as the Citation Success Index (CSI) or CPR, further enables probabilistic comparison between two groups or journals, giving the probability that a random paper from group $A$ out-cites one from $B$ (Milojević et al., 2016).

3. Analytical Framework and Computation

Calculation proceeds in two core modes, depending on context:

Group-Level Application (e.g., Funding Classes): Aggregate all observed citations to and from each group, tabulate the population counts, and evaluate CPR using the above formula.
Pairwise Journal Comparison (CSI): Let $X_A$ and $X_B$ be citation counts for random papers from journals $A$ and $B$ with empirical distributions $p_A(c)$ and $p_B(c)$ . The CSI between $A$ and $B$ is:

$\mathrm{CSI}(A,B) = P[X_A > X_B] + \frac{1}{2} \, P[X_A = X_B]$

or equivalently,

$\mathrm{CSI}(A,B) = \sum_{c=0}^\infty [P_A(>c) + \frac{1}{2} p_A(c)] p_B(c)$

where $P_A(>c) = \sum_{m>c} p_A(m)$ , numerically equivalent to the Mann–Whitney $U$ -statistic normalized to the total number of paper pairs, with ties as half-wins (Milojević et al., 2016).

For rapid estimation, CPR/CSI can be approximated using group or journal impact factors (IFs):

For journals $A$ and $B$ (IF ratio $x = \mathrm{IF}_A / \mathrm{IF}_B$ ), let $f_0$ be the fraction of zero-citation papers in $B$ . The approximate logistic form is:

$\mathrm{CSI}(A,B) \approx \frac{1}{1 + x^{-1.23}}$

for $x \geq 1$ and $f_0 \ll 1$ , with general form described in (Milojević et al., 2016).

4. Interpretation and Empirical Results

CPR provides a direct answer to the distributional preference question: is a class or journal cited preferentially relative to its size? In funding-class analysis within AI literature (Gnewuch et al., 5 Dec 2025), CPR values above 1 indicate citation preference or "insularity," 1 neutrality, and below 1 relative neglect. Key results (2018–2022):

Year	Industry-funded CPR	Non-industry-funded CPR	Non-funded CPR
2018	~0.9	>1	<1
2020	≈1	≈1	≈1
2021	~1.1	<1	<1
2022	~1.05	—	~0.8

Analysis demonstrates that, post-2021, industry-funded AI papers became cited more than warranted by their ~10% share of papers, evidencing an increasing citation preference and sectoral insularity (Gnewuch et al., 5 Dec 2025).

For journal comparison (CSI), a twofold IF advantage corresponds to a CSI of ~0.70; overwhelming advantage ( $>90\%$ out-citation probability) only arises for IF ratios exceeding 5–6, showing CSI to be less sensitive to sporadic high-impact papers than arithmetic impact factors (Milojević et al., 2016).

5. Robustness to Citation Outliers and Distributional Biases

CPR/CSI is designed to address skewness and outlier effects omnipresent in citation distributions. Unlike metrics such as mean citation rates or impact factor, which are susceptible to sporadic blockbuster papers, CPR/CSI aggregates over the entirety of the underlying citation histogram. This insensitivity ensures robustness: single extreme-tail papers minimally affect the final value, as CPR/CSI operates over the shape of the empirical distribution and not its average (Milojević et al., 2016).

In cross-group citation studies, CPR does not account for within-group topic popularity or recency bias, focusing strictly on proportional citation allocation arising from baseline group volume.

6. Assumptions, Limitations, and Contextual Considerations

Several caveats underpin CPR analysis:

Funding-class CPR relies on precise metadata. Undercounting occurs when authors omit funding sources, leading to potential misclassification (Gnewuch et al., 5 Dec 2025).
Citation counts are used as proxies for influence, but citation context (e.g., perfunctory versus foundational) is not distinguished.
Paper count normalization overlooks non-volume correlates of citation accrual, such as topical salience and recency.
In CSI applications, referencing only IFs may ignore notable distributional anomalies, though empirical evidence suggests IF ratio suffices for accurate estimation given similar distribution shapes across journals (Milojević et al., 2016).

7. Comparative Role Among Citation Metrics

CPR complements established metrics (absolute citation counts, h-index) by introducing a normalization for group or class size and challenging interpretations based solely on magnitude. Metrics such as mean/median citation or the h-index measure overall impact, often confounded by size and tail effects. CPR interrogates relative citation preference, crucial for analysis of preferential attachment, field insularity, or funding impact.

In journal comparison, CPR/CSI provides a transparent, probability-based alternative to the impact factor, extending the assessment beyond blunt magnitude-based ranking to distributional citation probability (Gnewuch et al., 5 Dec 2025, Milojević et al., 2016). This refinement is essential for nuanced, ecosystem-level citation dynamics and policy evaluation in academic publishing and research funding.

Markdown Report Issue Upgrade to Chat

References (2)

Big Tech-Funded AI Papers Have Higher Citation Impact, Greater Insularity, and Larger Recency Bias (2025)

Citation success index - An intuitive pair-wise journal comparison metric (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Citation Preference Ratio (CPR).

Citation Preference Ratio (CPR)

1. Formal Definition

2. Conceptual and Historical Rationale

3. Analytical Framework and Computation

4. Interpretation and Empirical Results

5. Robustness to Citation Outliers and Distributional Biases

6. Assumptions, Limitations, and Contextual Considerations

7. Comparative Role Among Citation Metrics

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Citation Preference Ratio (CPR)

1. Formal Definition

2. Conceptual and Historical Rationale

3. Analytical Framework and Computation

4. Interpretation and Empirical Results

5. Robustness to Citation Outliers and Distributional Biases

6. Assumptions, Limitations, and Contextual Considerations

7. Comparative Role Among Citation Metrics

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research