Comparative Analysis of Data Preprocessing Methods, Feature Selection Techniques and Machine Learning Models for Improved Classification and Regression Performance on Imbalanced Genetic Data
Abstract: Rapid advancements in genome sequencing have led to the collection of vast amounts of genomics data. Researchers may be interested in using machine learning models on such data to predict the pathogenicity or clinical significance of a genetic mutation. However, many genetic datasets contain imbalanced target variables that pose challenges to machine learning models: observations are skewed/imbalanced in regression tasks or class-imbalanced in classification tasks. Genetic datasets are also often high-cardinal and contain skewed predictor variables, which poses further challenges. We aimed to investigate the effects of data preprocessing, feature selection techniques, and model selection on the performance of models trained on these datasets. We measured performance with 5-fold cross-validation and compared averaged r-squared and accuracy metrics across different combinations of techniques. We found that outliers/skew in predictor or target variables did not pose a challenge to regression models. We also found that class-imbalanced target variables and skewed predictors had little to no impact on classification performance. Random forest was the best model to use for imbalanced regression tasks. While our study uses a genetic dataset as an example of a real-world application, our findings can be generalized to any similar datasets.
- Arvai, K.: Genetic Variant Classifications. Kaggle (2020). https://doi.org/10.34740/KAGGLE/DSV/1030915 . https://www.kaggle.com/dsv/1030915 Bobak et al. [2018] Bobak, C.A., Barr, P.J., O’Malley, A.J.: Estimation of an inter-rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales. BMC medical research methodology 18(1), 1–11 (2018) Bates et al. [2014] Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014) Breiman [2001] Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Bobak, C.A., Barr, P.J., O’Malley, A.J.: Estimation of an inter-rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales. BMC medical research methodology 18(1), 1–11 (2018) Bates et al. [2014] Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014) Breiman [2001] Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014) Breiman [2001] Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Bobak, C.A., Barr, P.J., O’Malley, A.J.: Estimation of an inter-rater intra-class correlation coefficient that overcomes common assumption violations in the assessment of health measurement scales. BMC medical research methodology 18(1), 1–11 (2018) Bates et al. [2014] Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014) Breiman [2001] Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014) Breiman [2001] Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014) Breiman [2001] Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001) Branco and Torgo [2019] Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Branco, P., Torgo, L.: A study on the impact of data characteristics in imbalanced regression tasks. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193–202 (2019). IEEE Chittineni and Bhogapathi [2012] Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Chittineni, S., Bhogapathi, R.B.: A study on the behavior of a neural network for grouping the data. arXiv preprint arXiv:1203.3838 (2012) Charilaou and Battat [2022] Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Charilaou, P., Battat, R.: Machine learning models and over-fitting considerations. World Journal of Gastroenterology 28(5), 605 (2022) Curran-Everett [2018] Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Curran-Everett, D.: Explorations in statistics: the log transformation. Advancxes in physiology education 42(2), 343–347 (2018) Changyong et al. [2014] Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Changyong, F., Hongyue, W., Naiji, L., Tian, C., Hua, H., Ying, L., et al.: Log-transformation and its implications for data analysis. Shanghai archives of psychiatry 26(2), 105 (2014) Derpanis [2010] Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Derpanis, K.G.: Overview of the ransac algorithm. Image Rochester NY 4(1), 2–3 (2010) Eddy [2004] Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Eddy, S.R.: Where did the blosum62 alignment score matrix come from? Nature biotechnology 22(8), 1035–1036 (2004) Ensembl [2014] Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Ensembl: Pathogenicity Predictions. http://useast.ensembl.org/info/genome/variation/prediction/protein_function.html#::̃text=The%20PolyPhen%20score%20represents%20the,used%20to%20make%20the%20predictions. Accessed: [Insert Access Date] (2014) Esteves [2020] Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Esteves, V.M.S.: Techniques to deal with imbalanced data in multi-class problems: A review of existing methods. (2020). https://api.semanticscholar.org/CorpusID:226157632 Feng et al. [2013] Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Feng, C., Wang, H., Lu, N., Tu, X.M.: Log transformation: application and interpretation in biomedical research. Statistics in medicine 32(2), 230–239 (2013) Guo et al. [2008] Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201 (2008). https://doi.org/10.1109/ICNC.2008.871 Keene [1995] Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Keene, O.N.: The log transformation is special. Statistics in medicine 14(8), 811–819 (1995) Khoshgoftaar et al. [2010] Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Khoshgoftaar, T.M., Gao, K., Hulse, J.V.: A novel feature selection technique for highly imbalanced data. 2010 IEEE International Conference on Information Reuse & Integration, 80–85 (2010) Koo and Li [2016] Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Koo, T.K., Li, M.Y.: A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine 15(2), 155–163 (2016) Koller [2016] Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Koller, M.: robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software 75, 1–24 (2016) Kryukov et al. [2007a] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics 80(4), 727–739 (2007) Kryukov et al. [2007b] Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Kryukov, G.V., Pennacchio, L.A., Sunyaev, S.R.: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. American journal of human genetics 80 4, 727–39 (2007) Kaur and Sarmadi [2024] Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Kaur, A., Sarmadi, M.: Predicting loss-of-function impact of genetic mutations: a machine learning approach. (2024). https://api.semanticscholar.org/CorpusID:267364864 Luo et al. [2019] Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Luo, H., Pan, X., Wang, Q., Ye, S., Qian, Y.: Logistic regression and random forest for effective imbalanced classification. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 916–917 (2019). IEEE Mirza et al. [2016] Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Mirza, B., Kok, S., Lin, Z., Yeo, Y.K., Lai, X., Cao, J., Sepulveda, J.: Efficient representation learning for high-dimensional imbalance data. In: 2016 IEEE International Conference on Digital Signal Processing (DSP), pp. 511–515 (2016). IEEE Montesinos López et al. [2022] Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Montesinos López, O.A., Montesinos López, A., Crossa, J.: Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction, pp. 109–139. Springer, ??? (2022) Maldonado et al. [2014] Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf. Sci. 286, 228–246 (2014) Nguwi and Cho [2010] Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Nguwi, Y.-Y., Cho, S.-Y.: An unsupervised self-organizing learning with support vector ranking for imbalanced datasets. Expert Syst. Appl. 37(12), 8303–8312 (2010) https://doi.org/10.1016/j.eswa.2010.05.054 Ni [2012] Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Ni, W.: A review and comparative study on univariate feature selection techniques (2012) Nakagawa and Schielzeth [2013] Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Nakagawa, S., Schielzeth, H.: A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2), 133–142 (2013) Niroula and Vihinen [2019] Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Niroula, A., Vihinen, M.: How good are pathogenicity predictors in detecting benign variants? PLoS computational biology 15(2), 1006481 (2019) Nyongesa [2020] Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Nyongesa, D.B.: Variable selection using random forests in sas®. (2020). https://api.semanticscholar.org/CorpusID:215753090 Palmeri [2016] Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Palmeri, M.: Chapter 18: Testing the Assumptions of Multilevel Models. Accessed: [Feb 5, 2024]. https://ademos.people.uic.edu/Chapter18.html#61_assumption_1_-_linearity Pargent et al. [2022] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Pargent, F., Pfisterer, F., Thomas, J., Bischl, B.: Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics 37(5), 2671–2692 (2022) Pant and Srivastava [2015] Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Pant, H.R., Srivastava, D.R.: A survey on feature selection methods for imbalanced datasets. (2015). https://api.semanticscholar.org/CorpusID:212490984 Ribeiro and Moniz [2020] Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Ribeiro, R.P., Moniz, N.: Imbalanced regression and extreme value prediction. Machine Learning 109, 1803–1835 (2020) Schielzeth et al. [2020] Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Schielzeth, H., Dingemanse, N.J., Nakagawa, S., Westneat, D.F., Allegue, H., Teplitsky, C., Réale, D., Dochtermann, N.A., Garamszegi, L.Z., Araya-Ajoy, Y.G.: Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in ecology and evolution 11(9), 1141–1152 (2020) Sim et al. [2012] Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.C.: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic acids research 40(W1), 452–457 (2012) Shahadat and Pal [2015] Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Shahadat, N., Pal, B.: An empirical analysis of attribute skewness over class imbalance on probabilistic neural network and naïve bayes classifier. 2015 International Conference on Computer and Information Engineering (ICCIE), 150–153 (2015) Sunyaev et al. [2001] Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., Bork, P.: Prediction of deleterious human alleles. Human molecular genetics 10(6), 591–597 (2001) Silva et al. [2022] Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Silva, A., Ribeiro, R.P., Moniz, N.: Model optimization in imbalanced regression. In: International Conference on Discovery Science, pp. 3–21 (2022). Springer Weisberg [2001] Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Weisberg, S.: Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June 1, 2003 (2001) Yousefi and Hamilton-Wright [2016] Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Yousefi, J., Hamilton-Wright, A.: Classification confusion within nefclass caused by feature value skewness in multi-dimensional datasets. In: International Joint Conference on Computational Intelligence (2016). https://api.semanticscholar.org/CorpusID:7673454 Yeo and Johnson [2000] Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Yeo, I.-K., Johnson, R.A.: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000) Ziegler and König [2014] Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Ziegler, A., König, I.R.: Mining data with random forests: current options for real‐world applications. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (2014) Zuliani [2009] Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009) Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
- Zuliani, M.: Ransac for dummies. Vision Research Lab, University of California, Santa Barbara (2009)
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.