Fitting Prediction Rule Ensembles to Psychological Research Data: An Introduction and Tutorial (1907.05302v5)
Abstract: Prediction rule ensembles (PREs) are a relatively new statistical learning method, which aim to strike a balance between predictive accuracy and interpretability. Starting from a decision tree ensemble, like a boosted tree ensemble or a random forest, PREs retain a small subset of tree nodes in the final predictive model. These nodes can be written as simple rules of the form if [condition] then [prediction]. As a result, PREs are often much less complex than full decision tree ensembles, while they have been found to provide similar predictive accuracy in many situations. The current paper introduces the methodology and shows how PREs can be fitted using the R package pre through several real-data examples from psychological research. The examples also illustrate a number of features of package \textbf{pre} that may be particularly useful for applications in psychology: support for categorical, multivariate and count responses, application of (non-)negativity constraints, inclusion of confirmatory rules and standardized variable importance measures.
- \APACrefYearMonthDay2010. \BBOQ\APACrefatitlePermutation importance: a corrected feature importance measure Permutation importance: a corrected feature importance measure.\BBCQ \APACjournalVolNumPagesBioinformatics26101340–1347. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015. \BBOQ\APACrefatitleFitting Linear Mixed-Effects Models Using lme4 Fitting linear mixed-effects models using lme4.\BBCQ \APACjournalVolNumPagesJournal of Statistical Software6711–48. {APACrefDOI} 10.18637/jss.v067.i01 \PrintBackRefs\CurrentBib
- \APACinsertmetastarBrei96bagging{APACrefauthors}Breiman, L. \APACrefYearMonthDay1996\BCnt1. \BBOQ\APACrefatitleBagging predictors Bagging predictors.\BBCQ \APACjournalVolNumPagesMachine Learning242123–140. \PrintBackRefs\CurrentBib
- \APACinsertmetastarBrei96{APACrefauthors}Breiman, L. \APACrefYearMonthDay1996\BCnt2. \BBOQ\APACrefatitleHeuristics of instability and stabilization in model selection Heuristics of instability and stabilization in model selection.\BBCQ \APACjournalVolNumPagesThe Annals of Statistics2462350–2383. \PrintBackRefs\CurrentBib
- \APACinsertmetastarBrei98{APACrefauthors}Breiman, L. \APACrefYearMonthDay1998. \BBOQ\APACrefatitleArcing Classifiers Arcing classifiers.\BBCQ \APACjournalVolNumPagesThe Annals of Statistics263801–849. \PrintBackRefs\CurrentBib
- \APACinsertmetastarBrei01RandFor{APACrefauthors}Breiman, L. \APACrefYearMonthDay2001\BCnt1. \BBOQ\APACrefatitleRandom forests Random forests.\BBCQ \APACjournalVolNumPagesMachine Learning4515–32. \PrintBackRefs\CurrentBib
- \APACinsertmetastarBrei01TwoCult{APACrefauthors}Breiman, L. \APACrefYearMonthDay2001\BCnt2. \BBOQ\APACrefatitleStatistical modeling: The two cultures (with comments and a rejoinder by the author) Statistical modeling: The two cultures (with comments and a rejoinder by the author).\BBCQ \APACjournalVolNumPagesStatistical Science163199–231. \PrintBackRefs\CurrentBib
- \APACrefYear1984. \APACrefbtitleClassification and Regression Trees Classification and regression trees. \APACaddressPublisherNew YorkWadsworth. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2014. \BBOQ\APACrefatitleInternet-delivered treatment for substance abuse: a multisite randomized controlled trial Internet-delivered treatment for substance abuse: a multisite randomized controlled trial.\BBCQ \APACjournalVolNumPagesAmerican Journal of Psychiatry1716683–690. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2012. \BBOQ\APACrefatitleDesign and methodological considerations of an effectiveness trial of a computer-assisted intervention: an example from the NIDA Clinical Trials Network Design and methodological considerations of an effectiveness trial of a computer-assisted intervention: an example from the nida clinical trials network.\BBCQ \APACjournalVolNumPagesContemporary Clinical Trials332386–395. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleStatistical learning theory for high dimensional prediction: Application to criterion-keyed scale development Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development.\BBCQ \APACjournalVolNumPagesPsychological Methods214603. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay1999. \BBOQ\APACrefatitleA simple, fast, and effective rule learner A simple, fast, and effective rule learner.\BBCQ \BIn \APACrefbtitleProceedings of the National Conference on Artificial Intelligence Proceedings of the National Conference on Artificial Intelligence (\BPGS 335–342). \PrintBackRefs\CurrentBib
- \APACinsertmetastarCric96{APACrefauthors}Crick, N\BPBIR. \APACrefYearMonthDay1996. \BBOQ\APACrefatitleThe role of overt aggression, relational aggression, and prosocial behavior in the prediction of children’s future social adjustment The role of overt aggression, relational aggression, and prosocial behavior in the prediction of children’s future social adjustment.\BBCQ \APACjournalVolNumPagesChild Development6752317–2327. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2002. \BBOQ\APACrefatitleRisk factors for 12-month comorbidity of mood, anxiety, and substance use disorders: findings from the Netherlands Mental Health Survey and Incidence Study Risk factors for 12-month comorbidity of mood, anxiety, and substance use disorders: findings from the netherlands mental health survey and incidence study.\BBCQ \APACjournalVolNumPagesAmerican Journal of Psychiatry1594620–629. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2010. \BBOQ\APACrefatitleENDER: A statistical framework for boosting decision rules ENDER: A statistical framework for boosting decision rules.\BBCQ \APACjournalVolNumPagesData Mining and Knowledge Discovery21152–90. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2004. \BBOQ\APACrefatitleThe SCL-90-R, the Brief Symptom Inventory (BSI), and the BSI-18 The SCL-90-R, the Brief Symptom Inventory (BSI), and the BSI-18.\BBCQ \BIn M. Maruish (\BED), \APACrefbtitleThe use of psychological testing for treatment planning and outcomes assessment: Instruments for adults The use of psychological testing for treatment planning and outcomes assessment: Instruments for adults (\BPGS 1–41). \APACaddressPublisherMahwah, NJ, USLawrence Erlbaum Associates Publishers. \PrintBackRefs\CurrentBib
- \APACinsertmetastarFokkinpress{APACrefauthors}Fokkema, M. \APACrefYearMonthDayaccepted. \BBOQ\APACrefatitleFitting prediction rule ensembles with R package pre Fitting prediction rule ensembles with R package pre.\BBCQ \APACjournalVolNumPagesJournal of Statistical Software. {APACrefURL} \urlhttps://arxiv.org/abs/1707.07149 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015. \BBOQ\APACrefatitleConnecting clinical and actuarial prediction with rule-based methods. Connecting clinical and actuarial prediction with rule-based methods.\BBCQ \APACjournalVolNumPagesPsychological Assessment272636. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay1995. \BBOQ\APACrefatitleA desicion-theoretic generalization of on-line learning and an application to boosting A desicion-theoretic generalization of on-line learning and an application to boosting.\BBCQ \BIn \APACrefbtitleEuropean Conference on Computational Learning Theory European Conference on Computational Learning Theory (\BPGS 23–37). \PrintBackRefs\CurrentBib
- \APACinsertmetastarFrie01{APACrefauthors}Friedman, J. \APACrefYearMonthDay2001. \BBOQ\APACrefatitleGreedy function approximation: a gradient boosting machine Greedy function approximation: a gradient boosting machine.\BBCQ \APACjournalVolNumPagesAnnals of Statistics1189–1232. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2010. \BBOQ\APACrefatitleRegularization Paths for Generalized Linear Models via Coordinate Descent Regularization paths for generalized linear models via coordinate descent.\BBCQ \APACjournalVolNumPagesJournal of Statistical Software3311–22. {APACrefURL} \urlhttp://www.jstatsoft.org/v33/i01/ \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2003. \APACrefbtitleImportance sampled learning ensembles Importance sampled learning ensembles [Technical Report]. \APACaddressPublisherStanford University. {APACrefURL} \urlhttp://www-stat.stanford.edu/ jhf/ftp/isle.pdf \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2008. \BBOQ\APACrefatitlePredictive learning via rule ensembles Predictive learning via rule ensembles.\BBCQ \APACjournalVolNumPagesThe Annals of Applied Statistics23916–954. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2012. \APACrefbtitleRuleFit (version 3) Rulefit (version 3)Â [Computer software]. {APACrefURL} \urlhttp://www-stat.stanford.edu/Â jhf/R-RuleFit.html \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay1996. \BBOQ\APACrefatitleReasoning the fast and frugal way: Models of bounded rationality Reasoning the fast and frugal way: Models of bounded rationality.\BBCQ \APACjournalVolNumPagesPsychological Review1034650–669. \PrintBackRefs\CurrentBib
- \APACinsertmetastarGrah09{APACrefauthors}Graham, J\BPBIW. \APACrefYearMonthDay2009. \BBOQ\APACrefatitleMissing data analysis: Making it work in the real world Missing data analysis: Making it work in the real world.\BBCQ \APACjournalVolNumPagesAnnual Review of Psychology200960549–576. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleBig data in psychology: Introduction to the special issue Big data in psychology: Introduction to the special issue.\BBCQ \APACjournalVolNumPagesPsychological Methods214447. \PrintBackRefs\CurrentBib
- \APACrefYear2009. \APACrefbtitleThe elements of statistical learning The elements of statistical learning (\PrintOrdinal2nd \BEd). \APACaddressPublisherNew YorkSpringer. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2008. \BBOQ\APACrefatitlePredicting cardiovascular risk in England and Wales: Prospective derivation and validation of QRISK2 Predicting cardiovascular risk in England and Wales: Prospective derivation and validation of QRISK2.\BBCQ \APACjournalVolNumPagesBritish Medical Journal33676591475–1482. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2006. \BBOQ\APACrefatitleUnbiased recursive partitioning: A conditional inference framework Unbiased recursive partitioning: A conditional inference framework.\BBCQ \APACjournalVolNumPagesJournal of Computational and Graphical Statistics153651–674. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015. \BBOQ\APACrefatitlepartykit: A Modular Toolkit for Recursive Partytioning in R partykit: A modular toolkit for recursive partytioning in R.\BBCQ \APACjournalVolNumPagesJournal of Machine Learning Research163905-3909. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2012. \BBOQ\APACrefatitleL1-Based Compression of Random Forest Models L1-based compression of random forest models.\BBCQ \BIn \APACrefbtitle20th European Symposium on Artificial Neural Networks. 20th European Symposium on Artificial Neural Networks. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2008. \BBOQ\APACrefatitleFrom Meehl to Fast and Frugal Heuristics (and Back): New Insights into How to Bridge the Clinical-Actuarial Divide From meehl to fast and frugal heuristics (and back): New insights into how to bridge the clinical-actuarial divide.\BBCQ \APACjournalVolNumPagesTheory & Psychology184443–464. \PrintBackRefs\CurrentBib
- \APACinsertmetastarKim09{APACrefauthors}Kim, J\BHBIH. \APACrefYearMonthDay2009. \BBOQ\APACrefatitleEstimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap.\BBCQ \APACjournalVolNumPagesComputational Statistics & Data Analysis53113735–3745. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleMining big data to extract patterns and predict real-life outcomes Mining big data to extract patterns and predict real-life outcomes.\BBCQ \APACjournalVolNumPagesPsychological Methods214493. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2014. \BBOQ\APACrefatitleCross-validation pitfalls when selecting and assessing regression and classification models Cross-validation pitfalls when selecting and assessing regression and classification models.\BBCQ \APACjournalVolNumPagesJournal of Cheminformatics610. \PrintBackRefs\CurrentBib
- \APACinsertmetastarKuhn08{APACrefauthors}Kuhn, M. \APACrefYearMonthDay2008. \BBOQ\APACrefatitleBuilding Predictive Models in R Using the caret Package Building Predictive Models in R Using the caret Package.\BBCQ \APACjournalVolNumPagesJournal of Statistical Software2851–26. {APACrefURL} \urlhttps://www.jstatsoft.org/v028/i05 {APACrefDOI} 10.18637/jss.v028.i05 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitlelmerTest Package: Tests in Linear Mixed Effects Models lmerTest package: Tests in linear mixed effects models.\BBCQ \APACjournalVolNumPagesJournal of Statistical Software82131–26. {APACrefDOI} 10.18637/jss.v082.i13 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2004. \BBOQ\APACrefatitlePredicting drinking behavior and alcohol-related problems among fraternity and sorority members: Examining the role of descriptive and injunctive norms. Predicting drinking behavior and alcohol-related problems among fraternity and sorority members: Examining the role of descriptive and injunctive norms.\BBCQ \APACjournalVolNumPagesPsychology of Addictive Behaviors183203. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2002. \BBOQ\APACrefatitleOrganizational citizenship behavior and workplace deviance: The role of affect and cognitions. Organizational citizenship behavior and workplace deviance: The role of affect and cognitions.\BBCQ \APACjournalVolNumPagesJournal of Applied Psychology871131. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2003. \BBOQ\APACrefatitleCoping skills and treatment outcomes in cognitive-behavioral and interactional group therapy for alcoholism Coping skills and treatment outcomes in cognitive-behavioral and interactional group therapy for alcoholism.\BBCQ \APACjournalVolNumPagesJournal of Consulting and Clinical Psychology711118. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay1997. \BBOQ\APACrefatitleSplit selection methods for classification trees Split selection methods for classification trees.\BBCQ \APACjournalVolNumPagesStatistica Sinica74815–840. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2011. \BBOQ\APACrefatitleA signal-detection analysis of fast-and-frugal trees A signal-detection analysis of fast-and-frugal trees.\BBCQ \APACjournalVolNumPagesPsychological Review1182316. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2004. \BBOQ\APACrefatitlePredictive validity of the Implicit Association Test in studies of brands, consumer attitudes, and behavior Predictive validity of the implicit association test in studies of brands, consumer attitudes, and behavior.\BBCQ \APACjournalVolNumPagesJournal of Consumer Psychology144405–415. \PrintBackRefs\CurrentBib
- \APACinsertmetastarMein10{APACrefauthors}Meinshausen, N. \APACrefYearMonthDay2010. \BBOQ\APACrefatitleNode harvest Node harvest.\BBCQ \APACjournalVolNumPagesThe Annals of Applied Statistics442049–2072. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitleFinding structure in data using multivariate tree boosting. Finding structure in data using multivariate tree boosting.\BBCQ \APACjournalVolNumPagesPsychological Methods214583. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2010. \BBOQ\APACrefatitleThe behaviour of random forest permutation-based variable importance measures under predictor correlation The behaviour of random forest permutation-based variable importance measures under predictor correlation.\BBCQ \APACjournalVolNumPagesBMC Bioinformatics111110. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2016. \BBOQ\APACrefatitlePredicting performance in higher education using proximal predictors Predicting performance in higher education using proximal predictors.\BBCQ \APACjournalVolNumPagesPloS one114e0153663. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2008. \BBOQ\APACrefatitleThe Netherlands Study of Depression and Anxiety (NESDA): Rationale, Objectives and Methods The Netherlands Study of Depression and Anxiety (NESDA): Rationale, objectives and methods.\BBCQ \APACjournalVolNumPagesInternational Journal of Methods in Psychiatric Research173121–140. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2011. \BBOQ\APACrefatitleTwo-year course of depressive and anxiety disorders: Results from the Netherlands Study of Depression and Anxiety (NESDA) Two-year course of depressive and anxiety disorders: Results from the Netherlands Study of Depression and Anxiety (NESDA).\BBCQ \APACjournalVolNumPagesJournal of Affective Disorders133176–85. \PrintBackRefs\CurrentBib
- \APACinsertmetastarR19{APACrefauthors}R Core Team. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleR: A Language and Environment for Statistical Computing R: A language and environment for statistical computing\BBCQ [\bibcomputersoftwaremanual]. \APACaddressPublisherVienna, Austria. {APACrefURL} \urlhttps://www.R-project.org/ \PrintBackRefs\CurrentBib
- \APACinsertmetastarRoka10{APACrefauthors}Rokach, L. \APACrefYearMonthDay2010. \BBOQ\APACrefatitleEnsemble-based classifiers Ensemble-based classifiers.\BBCQ \APACjournalVolNumPagesArtificial Intelligence Review331-21–39. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2014. \BBOQ\APACrefatitleModified rule ensemble method for binary data and its applications Modified rule ensemble method for binary data and its applications.\BBCQ \APACjournalVolNumPagesBehaviormetrika412225–244. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2011. \BBOQ\APACrefatitleRegularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent Regularization paths for cox’s proportional hazards model via coordinate descent.\BBCQ \APACjournalVolNumPagesJournal of Statistical Software3951–13. {APACrefURL} \urlhttp://www.jstatsoft.org/v39/i05/ \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2008. \BBOQ\APACrefatitleConditional variable importance for random forests Conditional variable importance for random forests.\BBCQ \APACjournalVolNumPagesBMC Bioinformatics911. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2007. \BBOQ\APACrefatitleBias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution Bias in random forest variable importance measures: Illustrations, sources and a solution.\BBCQ \APACjournalVolNumPagesBMC Bioinformatics825. {APACrefURL} \urlhttp://www.biomedcentral.com/1471-2105/8/25 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2009. \BBOQ\APACrefatitleAn introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests.\BBCQ \APACjournalVolNumPagesPsychological Methods144323. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2000. \BBOQ\APACrefatitleLightweight Rule Induction Lightweight rule induction.\BBCQ \BIn \APACrefbtitleProceedings of the Seventeenth International Conference on Machine Learning Proceedings of the seventeenth international conference on machine learning (\BPGS 1135–1142). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2010. \BBOQ\APACrefatitleThe efficacy of violence prediction: a meta-analytic comparison of nine risk assessment tools The efficacy of violence prediction: a meta-analytic comparison of nine risk assessment tools.\BBCQ \APACjournalVolNumPagesPsychological Bulletin1365740. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2008. \BBOQ\APACrefatitleMining diagnostic rules of breast tumor on ultrasound image using cost-sensitive RuleFit method Mining diagnostic rules of breast tumor on ultrasound image using cost-sensitive rulefit method.\BBCQ \BIn \APACrefbtitleISKE 2008: 3rd International Conference on Intelligent Systems and Knowledge Engineering ISKE 2008: 3rd International Conference on Intelligent Systems and Knowledge Engineering (\BVOL 1, \BPGS 354–359). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitleChoosing prediction over explanation in psychology: Lessons from machine learning Choosing prediction over explanation in psychology: Lessons from machine learning.\BBCQ \APACjournalVolNumPagesPerspectives on Psychological Science1261100–1122. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2006. \BBOQ\APACrefatitlePrediction of the 10-year course of borderline personality disorder Prediction of the 10-year course of borderline personality disorder.\BBCQ \APACjournalVolNumPagesAmerican Journal of Psychiatry1635827–832. \PrintBackRefs\CurrentBib
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.