Consistency Ratio (CR) in AHP
- Consistency Ratio (CR) is a normalized metric that quantifies the deviation from perfect consistency in pairwise comparison matrices of the Analytic Hierarchy Process (AHP).
- It establishes clear acceptance thresholds—CR < 0.05 is very good, 0.05 ≤ CR < 0.10 is acceptable, and CR ≥ 0.10 signals unacceptable inconsistency—guiding decision revisions.
- Recent evaluations reveal CR’s limitations, such as false negatives and order bias, prompting the development of optimization models and alternative triadic measures.
The Consistency Ratio (CR) is the most widely adopted quantitative index for measuring the deviation from perfect consistency in pairwise comparison matrices (PCMs) of the Analytic Hierarchy Process (AHP). Introduced by Saaty, CR provides a normalized metric for inconsistency by benchmarking a matrix’s principal eigenvalue against expected values derived from random matrices, supporting acceptance/rejection thresholds in multi-criteria decision analysis. The theoretical foundation, practical mechanics, and empirical performance of CR, as well as critical limitations and recent alternatives, are central to the evaluation and improvement of decision support systems utilizing PCMs.
1. Formal Definition and Mathematical Properties
Given an positive reciprocal PCM , the principal right eigenvalue is computed. Two key indices are defined:
- Consistency Index ():
- Consistency Ratio ():
is the Random Index: the mean over large samples of random reciprocal matrices of order , using entries drawn from the canonical AHP scale (typically ).
For all , a perfectly consistent matrix satisfies , so .
Typical Random Index (RI) Values
| n | RI |
|---|---|
| 1 | 0.00 |
| 2 | 0.00 |
| 3 | 0.58 |
| 4 | 0.90 |
| 5 | 1.12 |
| 6 | 1.24 |
| 7 | 1.32 |
| 8 | 1.41 |
| 9 | 1.45 |
| 10 | 1.49 |
2. Interpretation Benchmarks and Acceptance Thresholds
Standard practice, justified by Saaty’s original simulations and subsequent refinements, establishes specific CR acceptance thresholds:
- : “very good” consistency
- : “acceptable”
- : “unacceptably inconsistent”; revision of judgments is recommended
The de facto threshold for practical AHP implementations is , known as the “ten-percent rule” (Bose, 7 May 2025, 1311.0748).
3. Relationships to Other Inconsistency Indices
CR is a global index, reflecting overall transitivity violation through the lens of the Perron eigenvalue. Two prominent local triad-based measures offer alternative perspectives:
- Koczkodaj–Duszak’s CM (maximum triad inconsistency):
CM reflects the single worst triad violation.
- Peláez–Lamata’s CI (average determinant of triads):
where is the number of triads in the matrix. This measure vanishes precisely on consistent triads.
Both (from CR) and are convex in the log-space representation , whereas is quasi-convex but can be transformed by for convexity. A plausible implication is that CR’s global nature may overlook severe local violations, motivating further triad-based analyses (1311.0748).
4. Reducing Inconsistency via Optimization
Bozóki, Fülöp, and Poesz (1311.0748) introduce two mixed-integer nonlinear programming (MINLP) models to support real-time reduction of CR in PCMs:
Model A (Minimal Correction):
- Objective: minimize the number of modified entries to achieve
- Constraints:
- Skew-symmetry:
- Bound on modifications: controlled by binary variables
- Perron root constraint via variational characterization:
- Only the smallest necessary set of entries is flagged for revision, rather than re-eliciting all judgments.
Model B (Budgeted Correction):
- Objective: minimize given a ceiling on the number of entry modifications
- Similar set of variables and constraints; enforces
Solution is enabled by convexity in log-space, off-the-shelf MI-convex solvers (CPLEX, Gurobi), and practical tractability up to –$15$.
5. Empirical Performance and Limitations of CR
Recent systematic evaluations (Bose, 7 May 2025) highlight notable deficiencies of CR when applied to human-elicited (“logical”) PCMs:
- Low accuracy: On simulated “logical” PCMs of order 4–12, CR achieves only correct classification (consistent/inconsistent).
- False negatives: of actually inconsistent matrices are classified as consistent.
- Severe order bias: For , fewer than of logical PCMs pass, despite unsupervised clustering suggesting should qualify as consistent.
- Extremely low specificity: Systematically under-identifies consistent matrices, erring on over-rejection.
Benchmarking against the triadic preference reversal (PR) method yields:
| Metric | CR method | PR method |
|---|---|---|
| Accuracy | 50% | 97% |
| Sensitivity | ~100% | 96% |
| Specificity | ~10% | 97% |
| False Neg. | 5.5% | 2.6% |
| Order | Ab-initio | CR method | PR method |
|---|---|---|---|
| 4 | 72.9% | 28.1% | 74.8% |
| 6 | 59.0% | 12.6% | 59.6% |
| 8 | 56.9% | 4.1% | 54.0% |
| Overall | 58.2% | 8.5% | 58.4% |
This suggests the CR framework’s reliance on random matrix benchmarks and global eigenvalue aggregation systematically over-penalizes genuine but non-ideal human input.
6. Decision Support Implications and Contemporary Critique
The rigidity of CR thresholds and over-rejection of moderate inconsistencies have several operational consequences (Bose, 7 May 2025, 1311.0748):
- Workflow disruption: Insisting on leads to unnecessary re-elicitation, eroding user trust and distorting the priority structure toward “artificially smooth” outcomes.
- Order dependence: The probability of passing the CR test decreases sharply with higher-, rendering the metric problematic in complex decision settings.
- Practical resolution: Optimization-based frameworks (e.g., Model A/B above) can target only the most influential judgments, offering pragmatic corrections and minimizing user burden.
- Future alternatives: Methods based on triadic preference reversals and clustering (e.g., PR method) achieve higher empirical fidelity to “ground truth” consistency, aligning with human intuitive classifications.
A critical consensus is emerging that, while CR’s mathematical construction is elegant and directly interpretable via the eigenvector weighting method, it generates significant Type I and II errors in practice. Alternatives designed to match local, interpretable preference conflicts yield substantially improved reliability and are central to current methodological innovation in consistency assessment (Bose, 7 May 2025).