Uncertainty Quantification on Clinical Trial Outcome Prediction (2401.03482v3)
Abstract: The importance of uncertainty quantification is increasingly recognized in the diverse field of machine learning. Accurately assessing model prediction uncertainty can help provide deeper understanding and confidence for researchers and practitioners. This is especially critical in medical diagnosis and drug discovery areas, where reliable predictions directly impact research quality and patient health. In this paper, we proposed incorporating uncertainty quantification into clinical trial outcome predictions. Our main goal is to enhance the model's ability to discern nuanced differences, thereby significantly improving its overall performance. We have adopted a selective classification approach to fulfill our objective, integrating it seamlessly with the Hierarchical Interaction Network (HINT), which is at the forefront of clinical trial prediction modeling. Selective classification, encompassing a spectrum of methods for uncertainty quantification, empowers the model to withhold decision-making in the face of samples marked by ambiguity or low confidence, thereby amplifying the accuracy of predictions for the instances it chooses to classify. A series of comprehensive experiments demonstrate that incorporating selective classification into clinical trial predictions markedly enhances the model's performance, as evidenced by significant upticks in pivotal metrics such as PR-AUC, F1, ROC-AUC, and overall accuracy. Specifically, the proposed method achieved 32.37\%, 21.43\%, and 13.27\% relative improvement on PR-AUC over the base model (HINT) in phase I, II, and III trial outcome prediction, respectively. When predicting phase III, our method reaches 0.9022 PR-AUC scores. These findings illustrate the robustness and prospective utility of this strategy within the area of clinical trial predictions, potentially setting a new benchmark in the field.
- Emily Alsentzer et al. Publicly available clinical BERT embeddings. arXiv:1904.03323, 2019.
- Welcome to ICD-10 code for sarcopenia. Journal of cachexia, sarcopenia and muscle, 2016.
- Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9(8), 2008.
- Data-driven detection of subtype-specific differentially expressed genes. Scientific reports, 11(1):332, 2021.
- Edward Choi et al. GRAM: graph-based attention model for healthcare representation learning. In KDD, 2017.
- Chi-Keung Chow. An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers, (4):247–254, 1957.
- C Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on information theory, 16(1):41–46, 1970.
- Regression under human assistance. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 2611–2620, 2020.
- Ran El-Yaniv et al. On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(5), 2010.
- Selective prediction-set models with coverage guarantees. arXiv preprint arXiv:1906.05473, 2019.
- Fundamentals of clinical trials. Springer, 2015.
- Antibody complementarity determining regions (cdrs) design using constrained energy model. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 389–399, 2022.
- Differentiable scaffolding tree for molecular optimization. arXiv preprint arXiv:2109.10469, 2021.
- Reinforced genetic algorithm for structure-based drug design. Advances in Neural Information Processing Systems, 35:12325–12338, 2022.
- Hint: Hierarchical interaction network for clinical-trial-outcome predictions. Patterns, 3(4), 2022.
- Automated prediction of clinical trial outcome, February 2 2023. US Patent App. 17/749,065.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
- Pearl: Prototype learning via rule learning. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 223–232, 2019.
- Selective classification for deep neural networks. Advances in neural information processing systems, 30, 2017.
- Selectivenet: A deep neural network with an integrated reject option. In International conference on machine learning, pages 2151–2159. PMLR, 2019.
- Jayeeta Ghosh et al. Modeling admet. In Silico Methods for Predicting Drug Toxicity. 2016.
- Clinical development success rates for investigational drugs. Nat. Biotechnol., 2014.
- Probability of error, equivocation, and the chernoff bound. IEEE Transactions on Information Theory, 16(4):368–372, 1970.
- Martin E Hellman. The nearest neighbor classification rule with a reject option. IEEE Transactions on Systems Science and Cybernetics, 6(3):179–185, 1970.
- Predicting successes and failures of clinical trials with an ensemble ls-svr. medRxiv, 2020.
- Unanimous prediction for 100% precision with application to learning semantic mappings. arXiv preprint arXiv:1606.06368, 2016.
- Semi-supervised classification with graph convolutional networks. ICLR, 2017.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
- Heidi Ledford. 4 ways to fix the clinical trial: clinical trials are crumbling under modern economic and scientific pressures. nature looks at ways they might be saved. Nature, 2011.
- Machine learning with statistical imputation for predicting drug approvals, volume 60. SSRN, 2019.
- Integrated identification of disease specific pathways using multi-omics data. bioRxiv, page 666065, 2019.
- Cot: an efficient and accurate method for detecting marker genes among many subtypes. Bioinformatics Advances, 2(1):vbac037, 2022.
- Machine learning for synthetic data generation: a review. arXiv preprint arXiv:2302.04062, 2023.
- How much do clinical trials cost? Nat. Rev. Drug Discov., 2017.
- Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, pages 7076–7087. PMLR, 2020.
- Inductive confidence machines for regression. In Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19–23, 2002 Proceedings 13, pages 345–356. Springer, 2002.
- Richard Peto. Clinical trial methodology. Nature, 1978.
- Youran Qi and Qi Tang. Predicting phase 3 clinical trial results by modeling phase 2 clinical trial subject level data using deep learning. Proceedings of Machine Learning Research, 2019.
- Direct uncertainty prediction for medical second opinions. In International Conference on Machine Learning, pages 5281–5290. PMLR, 2019.
- Pranav Rajpurkar et al. Evaluation of a Machine Learning Model Based on Pretreatment Symptoms and Electroencephalographic Features to Predict Outcomes of Antidepressant Treatment in Adults With Depression: A Prespecified Secondary Analysis of a Randomized Clinical Trial. JAMA Network Open, 2020.
- Grand View Research. Clinical trials market size, share & trends analysis report by phase (phase i, phase ii, phase iii, phase iv), by study design (interventional, observational, expanded access), by indication, by region, and segment forecasts 2021–2028, 2021.
- Training very deep networks. In NIPS, 2015.
- The importance of good clinical practice guidelines and its role in clinical trials. Biomedical imaging and intervention journal, 4(1), 2008.
- On-line compression modeling i: conformal prediction. Algorithmic learning in a random world, pages 189–221, 2005.
- Artificial intelligence for in silico clinical trials: A review. arXiv preprint arXiv:2209.09023, 2022.
- Cosbin: cosine score-based iterative normalization of biologically diverse samples. Bioinformatics Advances, 2(1):vbac076, 2022.
- Enhance wound healing monitoring through a thermal imaging based smartphone app. In Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications, volume 10579, pages 438–441. SPIE, 2018.
- End-to-end convolutional semantic embeddings. In CVPR, 2018.
- Ddn2. 0: R and python packages for differential dependency network analysis of biological systems. bioRxiv, pages 2021–04, 2021.
- Tianyi Chen (139 papers)
- Nan Hao (3 papers)
- Yingzhou Lu (15 papers)
- Capucine Van Rechem (3 papers)
- Yuanyuan Zhang (129 papers)
- Jintai Chen (57 papers)
- Tianfan Fu (53 papers)