Developing an Evidence-Based Framework for Grading and Assessment of Predictive Tools for Clinical Decision Support

Published 19 Jun 2019 in cs.CY | (1907.03706v1)

Abstract: Background: Clinical predictive tools quantify contributions of relevant patient characteristics to derive likelihood of diseases or predict clinical outcomes. When selecting a predictive tool, for implementation at clinical practice or for recommendation in clinical guidelines, clinicians are challenged with an overwhelming and ever growing number of tools, most of which have never been implemented or assessed for comparative effectiveness. Objective: To develop a comprehensive framework to Grade and Assess Predictive tools (GRASP), and provide clinicians with a standardised, evidence based system to support their search for and selection of effective tools. Methods: A focused review of literature was conducted to extract criteria along which tools should be evaluated. An initial framework was designed and applied to assess and grade five tools: LACE Index, Centor Score, Wells Criteria, Modified Early Warning Score, and Ottawa knee rule. After peer review, by expert clinicians and healthcare researchers, the framework was revised and the grading of the tools was updated. Results: GRASP framework grades predictive tools based on published evidence across three dimensions: 1) Phase of evaluation; 2) Level of evidence; and 3) Direction of evidence. The final grade of a tool is based on the highest phase of evaluation, supported by the highest level of positive evidence, or mixed evidence that supports positive conclusion. Discussion and Conclusion: the GRASP framework builds on well established models and widely accepted concepts to provide standardised assessment and evidence based grading of predictive tools. Unlike other methods, GRASP is based on the critical appraisal of published evidence reporting the predictive tools predictive performance before implementation, potential effect and usability during implementation, and their post implementation impact.

Abstract PDF Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper introduces GRASP, a framework that grades predictive tools using evidence from development through real-world impact.
It details a structured methodology that integrates evaluation phases, evidence levels, and directional evidence to overcome limitations of traditional appraisal schemes.
The framework’s application to five clinical tools demonstrates its ability to differentiate tool effectiveness and guide clinical decision support.

Evidence-Based Grading of Predictive Tools for Clinical Decision Support: The GRASP Framework

Introduction

The manuscript "Developing an Evidence-Based Framework for Grading and Assessment of Predictive Tools for Clinical Decision Support" (1907.03706) addresses the escalating challenge of evaluating and selecting predictive tools in clinical practice. With the proliferation of such tools—ranging from clinical decision rules to machine learning algorithms—there is a pressing need for a robust, standardized, and evidence-based framework to support critical appraisal, selection, and recommendation of these artifacts. The proposed GRASP framework (Grading and Assessment of Predictive Tools) is designed to operationalize these needs by assimilating published evidence across development, validation, usability, and real-world clinical impact.

Motivation and Limitations of Existing Appraisal Schemes

Current approaches, such as TRIPOD and CHARMS, focus predominantly on the reporting and appraisal of pre-implementation predictive performance, neglecting usability and post-implementation outcomes. Systems like GRADE, while rigorous for guideline development, do not translate directly to comparative evaluation or grading of predictive algorithms. Furthermore, most existing schemes lack explicit mechanisms for combining evidence across heterogeneous sources and study designs, or for accounting for conflicting findings in varying clinical contexts.

Reliable benchmarking and grading is impeded by several factors:

Overabundance of unimplemented or internally validated tools with limited evidence for real-world impact.
Scarcity of comparative or post-implementation studies.
Variability in study quality, validation strategies, patient populations, and clinical workflows.
Context-dependent acceptability thresholds for performance and utility metrics, making direct metric aggregation inadvisable.

GRASP Framework: Structure and Methodology

The GRASP framework synthesizes evaluation along three axes:

Phase of Evaluation: Spanning pre-implementation (predictive performance), peri-implementation (usability and potential effect), and post-implementation (real-world clinical, system, or economic impact).
Level of Evidence: Nested grading (C1–C3, B1–B2, A1–A3) derived from quantity and rigor of validation (e.g., multiple vs. single external datasets, experimental vs. observational impact studies).
Direction of Evidence: Protocolized appraisal of the polarity (positive/mixed/negative) of the body of evidence, with explicit handling of mixed findings based on context and study quality.

The design process entailed a focused literature review to identify evaluation criteria, application of the framework to five well-characterized predictive tools, and structured peer review by domain experts, resulting in iterative refinement.

Application to Representative Predictive Tools

The framework was systematically applied to five tools: LACE Index for Readmission, Centor Score for Streptococcal Pharyngitis, Wells’ Criteria for Pulmonary Embolism, Modified Early Warning Score (MEWS), and the Ottawa Knee Rule. The grading assignments illustrate the diverse evidentiary landscapes and differential maturity of these tools:

Ottawa Knee Rule (Grade A1): Backed by strong post-implementation evidence, including controlled studies showing efficient reduction in imaging with no adverse patient impact and significant cost savings.
MEWS (Grade A2): Multiple external validations, plus observational studies supporting reduced adverse events, augmented documentation, and mortality benefits post-implementation.
Wells’ Criteria (Grade A2): Robust evidence for both external validity and longitudinal impact on imaging efficiency in clinical workflows.
Centor Score (Grade B1): Validated for predictive discrimination and usability but demonstrates inconsistent post-implementation impact on targeted outcomes, e.g., antibiotic over-prescription.
LACE Index (Grade C1): Validated externally for its original population, but lacking evidence for effectiveness, usability, or impact in real-world settings, with poor performance in certain subpopulations.

These examples underscore the framework's capacity to differentiate tools not only by underlying model quality but also by clinical integration and verified endpoint benefits.

Theoretical and Practical Implications

The GRASP framework operationalizes an evidence-based, multidimensional, and context-sensitive strategy, counterbalancing the limitations of one-dimensional metric aggregation. The protocol for mixed evidence is especially notable, as it systematically privileges high-quality, context-matched studies when overall conclusions are ambiguous. This is critical in clinical domains where heterogeneity is inherent and context matters.

Practically, the framework supports two main user bases:

Expert users (evaluators, researchers): Perform critical appraisals and assign structured grades based on comprehensive review.
End users (clinicians, guideline panels): Benchmark and compare tools using synthesized grades, with access to underlying evidence and contextual caveats.

The introduction of GRASP as an online platform, paired with potential semi-automated update mechanisms, could enable dynamic, community-vetted tool registries, supporting evidence currency as new studies emerge.

Limitations and Directions for Future Work

The initial validation is restricted in scope (five tools, limited expert feedback), and the GRASP framework should itself be subjected to systematic validation, including user-centered studies assessing decision quality and downstream clinical impact. Expansion to a broader array of tools—including complex machine learning models and more granular cost-effectiveness analyses—is a necessary extension.

Handling publication bias, evidence gaps (particularly for usability and implementation studies), and the integration of local-context variables remain open research problems. Sustainable, scalable evidence ingestion (e.g., leveraging NLP or living systematic review platforms) is essential for operational viability.

Conclusion

The GRASP framework constitutes a methodologically robust, evidence-driven system for grading and assessing predictive tools in clinical decision support. By formally integrating evaluation across development, usability, and real-world impact phases, and establishing a structured protocol for synthesizing diverse and conflicting evidence, it provides an actionable solution for tool selection, benchmarking, and clinical guideline integration. Its utility will be maximized through community adoption, infrastructure support for continuous updating, and concerted validation efforts in real-world deployment contexts (1907.03706).

Markdown