Percentile-Based Rating Transformation
- Percentile-based rating transformation is a non-parametric method that maps raw scores to relative positions within a reference set, producing continuous or discrete ratings.
- It employs algorithms including sorting, tie-handling (such as fractional allocation and P100 methods), and threshold binning to achieve consistent, field-normalized indicators.
- The approach is widely used in bibliometrics and recommender systems to enable fair cross-field comparisons and improve performance through standardized rating scales.
A percentile-based rating transformation is a non-parametric method that ranks or maps raw scores (e.g. citations, ratings) into percentile scores or discrete rating classes based on their position in a reference distribution. In bibliometrics and recommendation systems, such transformations enable meaningful comparisons across individuals, items, or fields by normalizing heterogeneous, often heavily skewed, discrete data to standard scales. The approach is central to field-normalized citation indicators, top-k% classification, and user-level adjustment in rating systems.
1. Conceptual Foundations and Definitions
A percentile-based rating transformation transforms observed values into their relative position (percentile) within a defined reference set. Let be the multiset of observed scores. The empirical distribution function (EDF) is , and the -quantile is (Rousseau, 2011). For discrete data, percentiles divide into intervals of specified width, and each interval or class receives a (possibly ordinal) score. The transformation thus produces either a continuous percentile (0–100 scale) or a discrete rating based on defined thresholds.
The key challenge lies in handling ties (identical values), which frequently occur in real datasets. If untreated, ties at class boundaries introduce ambiguity, leading to unstable or inconsistent aggregates (Schreiber, 2013, Waltman et al., 2012). Several tie-handling and fractional allocation schemes have been developed to ensure well-defined percentile scores and field-normalized indicators.
2. Formal Algorithms and Binning Schemes
2.1 Continuous Percentile Assignment
For an observed value in a reference set of size , the percentile is computed as
Ties are handled by assigning all tied values the midpoint of their cumulative interval , i.e.,
where is the size of the tie block (Leydesdorff, 2012).
2.2 Discrete Classes (Percentile Rank Classes)
Discrete rating classes are defined by cut-points , with mapping
Nonlinear binning schemes are standard in bibliometrics:
- PR(6) uses with thresholds at 0, 50, 75, 90, 95, 99, 100 (corresponding to <50%, 50%, 25%, 10%, 5%, 1%) (Schreiber, 2013, Bornmann, 2012).
Fractional scoring further refines the assignment when a tie block crosses a class boundary: the block is proportionally split across the overlapping intervals, ensuring the total number in each class matches the targeted proportions (Schreiber, 2013, Leydesdorff, 2012).
3. Key Methods in Bibliometrics
3.1 Integer-Based Versus Fractional Allocation
Traditional integer-based approaches use “count-lower,” “count-lower-or-equal,” or mid-rank assignment for ties, often causing departures from the theoretical class proportions and arbitrary cut points (Schreiber, 2013, Waltman et al., 2012).
Fractional scoring, as formalized by Schreiber, partitions tie blocks exactly at class boundaries. For each boundary , if a tied group of size straddles the boundary with below:
Fraction is allocated to the lower class and to the upper, assigning each tied paper the average score (Schreiber, 2013). This guarantees invariance of the global mean and class proportions:
3.2 Special Indicators
- I3 Indicator: The sum of class scores (not normalized) yields an absolute-performance indicator, strictly congruous under set augmentation (Rousseau, 2011).
- Top-x% Proportion: For threshold, papers in the highest interval receive value 1, others 0; fractional allocation ensures precisely at the reference set level (Waltman et al., 2012).
3.3 Citation-Rank Approaches: P100, P100′, and P100″
P100 assigns percentiles by unique value ranks (ignores tie counts), ensuring 0 (minimum) and 100 (maximum) are always achieved. P100′ incorporates the cumulative number of papers with lower scores, yielding outcomes nearly identical to inverted-InCites percentiles, except for peculiarities with multiple maximum-value ties. P100″ provides a hybrid, always remaining between the lower and upper percentile limits, linearly interpolated between the two (Schreiber, 2014, Bornmann et al., 2013).
| Method | Tie Treatment | 0–100 Ends | Remarks |
|---|---|---|---|
| P100 | Unique value only | Yes | May collapse large groups at ties |
| P100′ | Cumulative count | Yes | Equivalent to inverted-InCites for most |
| P100″ | Weighted hybrid | Yes | Never exceeds “uncertainty interval” |
4. Applications and Empirical Evaluations
4.1 Bibliometrics
Percentile-based transformation is the basis for field- and year-normalized citation metrics:
- Allows fair comparison across fields.
- Identifies top-k% performers, e.g., top 10% most-cited (Bornmann, 2012, Schreiber, 2013).
- Supports aggregate indicators with quantifiable uncertainty due to ties (Leydesdorff, 2012).
- Used in institutional comparisons, research evaluation (double-rank analysis), and context-sensitive ranking (Brito et al., 2017).
Empirical studies validate that integer-based methods deviate from ideal properties, with deviations up to ±5% of the expected average in large datasets at principal boundaries (e.g., median or 90th percentile), while fractional scoring exactly recovers the theoretical means and proportions (Schreiber, 2013).
4.2 Recommender Systems
Percentile-based transformations for rating normalization have been adopted to mitigate user bias, skew, and idiosyncratic scale usage. The transformation maps each user’s ratings to percentiles relative to their own historical distribution, producing “flat” rating distributions which:
- Increase entropy and effective information content.
- Improve recommendation performance (substantive NDCG gains of 30–270%) over standard z-score normalization (Mansoury et al., 2019).
- Permit smoothed percentile transformations (adding pseudocounts) to avoid degenerate profiles where a user rates all items identically (Mansoury et al., 2019).
Advanced models such as CMTRF generalize these ideas by fitting per-user or per-cluster monotone transforms, with percentile-based mappings forming a special case for the isotonic step (Hiranandani et al., 2018).
5. Statistical Properties, Field-Invariance, and Congruousness
Percentile-based transformations are strictly congruous indicators of relative performance: the group orderings are preserved under simultaneous addition or removal of records, provided that the group sizes remain equal and the reference set boundaries do not change (Rousseau, 2011).
The approach, especially with fractional attribution, is field-unbiased. That is, for any chosen set of cut-points, exactly the prescribed fraction of the entire field falls in each class, regardless of the specific citation distribution (Waltman et al., 2012). This is essential for it to serve as a normalization procedure in cross-field comparisons.
6. Implementation Considerations and Best Practices
Efficient algorithms for percentile-based rating transformations involve sorting the data (complexity ), identifying tie-blocks, computing percentile intervals, and, for class binning, calculating overlaps and class membership or fractional splits (complexity , = class count) (Leydesdorff, 2012). Reporting protocols should always specify the tie-handling rule used (average, mid-rank, fractional, P100, etc.), and for small reference sets, use fractional methods to minimize threshold ambiguity (Schreiber, 2013, Schreiber, 2014).
In recommendation pipelines, percentile transformation is a model-agnostic preprocessing step; it standardizes the input without altering model structures, promoting cross-user comparability and enhancing downstream learning (Mansoury et al., 2019).
7. Limitations and Recommendations
Despite solving key ambiguities, percentile-based transformations are sensitive to the definition of the reference set and the statistical nature of the data:
- P100 and related rank-based indicators may behave unstably when the number of unique values shifts, especially with extreme ties (Schreiber, 2014).
- Results can become unreliable with small unless fractional or continuous quantile methods are used (Schreiber, 2013, Leydesdorff, 2012).
- For evaluation tasks requiring the exact proportion in a given class, fractional scoring is necessary; for all-purpose 0–100 scoring, inverted-InCites or P100′ (with reporting of “uncertainty intervals”) are recommended (Schreiber, 2014, Bornmann et al., 2013).
- In early predictive assessments (e.g., citation windows <5 years), all percentile methods have modest predictive power; incorporating additional covariates (e.g., journal prestige) stabilizes the rankings (Bornmann et al., 2013).
Whenever rating transformations are employed, the chosen methodology, tie-handling rule, and class boundaries should be explicitly documented, particularly in cross-group or cross-field comparisons. For recommender system scenarios, maintain “flatness” in user-level distributions using percentiles with smoothing for robust, entropy-maximizing transformations (Mansoury et al., 2019).
Key references: (Schreiber, 2013, Leydesdorff, 2012, Waltman et al., 2012, Rousseau, 2011, Schreiber, 2014, Bornmann et al., 2013, Bornmann, 2012, Mansoury et al., 2019, Hiranandani et al., 2018, Brito et al., 2017).