Explainable AI Integrated Feature Selection for Landslide Susceptibility Mapping using TreeSHAP (2201.03225v2)
Abstract: Landslides have been a regular occurrence and an alarming threat to human life and property in the era of anthropogenic global warming. An early prediction of landslide susceptibility using a data-driven approach is a demand of time. In this study, we explored the eloquent features that best describe landslide susceptibility with state-of-the-art machine learning methods. In our study, we employed state-of-the-art machine learning algorithms including XgBoost, LR, KNN, SVM, and Adaboost for landslide susceptibility prediction. To find the best hyperparameters of each individual classifier for optimized performance, we have incorporated the Grid Search method, with 10 Fold Cross-Validation. In this context, the optimized version of XgBoost outperformed all other classifiers with a Cross-validation Weighted F1 score of 94.62 %. Followed by this empirical evidence, we explored the XgBoost classifier by incorporating TreeSHAP, a game-theory-based statistical algorithm used to explain Machine Learning models, to identify eloquent features such as SLOPE, ELEVATION, TWI that complement the performance of the XGBoost classifier mostly and features such as LANDUSE, NDVI, SPI which has less effect on models performance. According to the TreeSHAP explanation of features, we selected the 9 most significant landslide causal factors out of 15. Evidently, an optimized version of XgBoost along with feature reduction by 40 % has outperformed all other classifiers in terms of popular evaluation metrics with a Cross-Validation Weighted F1 score of 95.01 % on the training and AUC score of 97 %
- Landslide susceptibility and influencing factors analysis in rwanda. Environment, Development and Sustainability, 22(8):7985–8012, Dec 2020. ISSN 1573-2975. doi:10.1007/s10668-019-00557-4. URL https://doi.org/10.1007/s10668-019-00557-4.
- U.S. Geological Survey. How many deaths result from landslides each year?, 2021. URL https://www.usgs.gov/faqs/how-many-deaths-result-landslides-each-year?qt-news_science_products=0#qt-news_science_products.
- Neegar Sultana. Analysis of landslide-induced fatalities and injuries in bangladesh: 2000-2018. Cogent Social Sciences, 6(1):1737402, 2020. doi:10.1080/23311886.2020.1737402. URL https://doi.org/10.1080/23311886.2020.1737402.
- The economic impact of landslides and floods on the road network. Procedia Engineer, 143:1425–1434, 2016. ISSN 1877-7058. doi:https://doi.org/10.1016/j.proeng.2016.06.168. URL https://www.sciencedirect.com/science/article/pii/S1877705816306154. Advances in Transportation Geotechnics III.
- Direct impacts of landslides on socio-economic systems: a case study from aranayake, sri lanka. Geoenvironmental Disasters, 5(1):11, Aug 2018. doi:10.1186/s40677-018-0104-6. URL https://doi.org/10.1186/s40677-018-0104-6.
- United Nations FAO. Landslides : FAO in Emergencies, 2021. URL http://www.fao.org/emergencies/emergency-types/landslides/en/.
- A deep learning algorithm using a fully connected sparse autoencoder neural network for landslide susceptibility prediction. Landslides, 17(1):217–229, Jan 2020a. ISSN 1612-5118. doi:10.1007/s10346-019-01274-9. URL https://doi.org/10.1007/s10346-019-01274-9.
- Developing comprehensive geocomputation tools for landslide susceptibility mapping: Lsm tool pack. Comput. Geosci., 144:104592, 2020. ISSN 0098-3004. doi:https://doi.org/10.1016/j.cageo.2020.104592. URL https://www.sciencedirect.com/science/article/pii/S009830042030577X.
- Landslide susceptibility evaluating using artificial intelligence method in the youfang district (china). Environ Earth Sci, 78(15):488, Aug 2019. ISSN 1866-6299. doi:10.1007/s12665-019-8415-9. URL https://doi.org/10.1007/s12665-019-8415-9.
- Rainfall induced landslide susceptibility mapping using novel hybrid soft computing methods based on multi-layer perceptron neural network classifier. Geocarto International, 0(0):1–25, 2020. doi:10.1080/10106049.2020.1837262. URL https://doi.org/10.1080/10106049.2020.1837262.
- Landslide susceptibility assesssment in the uttarakhand area (india) using gis: a comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theor. Appl. Climatol., 128(1):255–273, Apr 2017. ISSN 1434-4483. doi:10.1007/s00704-015-1702-9. URL https://doi.org/10.1007/s00704-015-1702-9.
- Landslide susceptibility prediction using particle-swarm-optimized multilayer perceptron: Comparisons with multilayer-perceptron-only, bp neural network, and information value models. Applied Sciences, 9(18), 2019. ISSN 2076-3417. doi:10.3390/app9183664. URL https://www.mdpi.com/2076-3417/9/18/3664.
- A spatially explicit deep learning neural network model for the prediction of landslide susceptibility. CATENA, 188:104451, 2020. ISSN 0341-8162. doi:https://doi.org/10.1016/j.catena.2019.104451. URL https://www.sciencedirect.com/science/article/pii/S0341816219305934.
- Comparing the prediction performance of a deep learning neural network model with conventional machine learning models in landslide susceptibility assessment. CATENA, 188:104426, 2020. ISSN 0341-8162. doi:https://doi.org/10.1016/j.catena.2019.104426. URL https://www.sciencedirect.com/science/article/pii/S0341816219305685.
- Landslide susceptibility prediction modeling based on remote sensing and a novel deep learning algorithm of a cascade-parallel recurrent neural network. Sens., 20(6), 2020. ISSN 1424-8220. doi:10.3390/s20061576. URL https://www.mdpi.com/1424-8220/20/6/1576.
- Landslide susceptibility modeling using reduced error pruning trees and different ensemble techniques: Hybrid machine learning approaches. CATENA, 175:203–218, 2019. ISSN 0341-8162. doi:https://doi.org/10.1016/j.catena.2018.12.018. URL https://www.sciencedirect.com/science/article/pii/S0341816218305538.
- Landslide susceptibility assessment by novel hybrid machine learning algorithms. Sustainability, 11(16), 2019. ISSN 2071-1050. doi:10.3390/su11164386. URL https://www.mdpi.com/2071-1050/11/16/4386.
- Comparisons of heuristic, general statistical and machine learning models for landslide susceptibility prediction and mapping. CATENA, 191:104580, 2020b. ISSN 0341-8162. doi:https://doi.org/10.1016/j.catena.2020.104580. URL https://www.sciencedirect.com/science/article/pii/S0341816220301302.
- A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping. Int J Geogr Inf Sci, 35(2):321–347, 2021. doi:10.1080/13658816.2020.1808897. URL https://doi.org/10.1080/13658816.2020.1808897.
- Emrehan Kutlug Sahin. Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. Geocarto International, 0(0):1–25, 2020. doi:10.1080/10106049.2020.1831623. URL https://doi.org/10.1080/10106049.2020.1831623.
- Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth Sci. Rev., 207:103225, 2020. ISSN 0012-8252. doi:https://doi.org/10.1016/j.earscirev.2020.103225. URL https://www.sciencedirect.com/science/article/pii/S0012825220302713.
- Landslide susceptibility mapping using hybrid random forest with geodetector and rfe for factor optimization. Geosci. Front., 12(5):101211, 2021. ISSN 1674-9871. doi:https://doi.org/10.1016/j.gsf.2021.101211. URL https://www.sciencedirect.com/science/article/pii/S167498712100075X.
- Landslide susceptibility mapping in three upazilas of rangamati hill district bangladesh: application and comparison of gis-based machine learning methods. Geocarto International, 0(0):1–27, 2021. doi:10.1080/10106049.2020.1864026. URL https://doi.org/10.1080/10106049.2020.1864026.
- A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in china. Bull. Eng. Geol. Env., 77(2):647–664, May 2018. ISSN 1435-9537. doi:10.1007/s10064-017-1010-y. URL https://doi.org/10.1007/s10064-017-1010-y.
- Assessment of the effects of training data selection on the landslide susceptibility mapping: a comparison between support vector machine (svm), logistic regression (lr) and artificial neural networks (ann). Geomatics, Natural Hazards and Risk, 9(1):49–69, 2018. doi:10.1080/19475705.2017.1407368. URL https://doi.org/10.1080/19475705.2017.1407368.
- Landslide susceptibility zonation method based on c5.0 decision tree and k-means cluster algorithms to improve the efficiency of risk management. Geosci. Front., page 101249, 2021. ISSN 1674-9871. doi:https://doi.org/10.1016/j.gsf.2021.101249. URL https://www.sciencedirect.com/science/article/pii/S1674987121001134.
- Comparisons of various types of normality tests. J. Stat. Comput. Simul., 81(12):2141–2155, 2011. doi:10.1080/00949655.2010.520163. URL https://doi.org/10.1080/00949655.2010.520163.
- Patrick Royston. Approximating the shapiro-wilk w-test for non-normality. Stat. Comput., 2(3):117–119, Sep 1992. ISSN 1573-1375. doi:10.1007/BF01891203. URL https://doi.org/10.1007/BF01891203.
- Chi-square test and its application in hypothesis testing. Journal of the Practice of Cardiovascular Sciences, 1, 01 2015. doi:10.4103/2395-5414.157577.
- Mary L McHugh. The chi-square test of independence. Biochemia medica, 23(2):143–149, 2013.
- Yu Huang and Lu Zhao. Review on landslide susceptibility mapping using support vector machines. CATENA, 165:520–529, 2018. ISSN 0341-8162. doi:https://doi.org/10.1016/j.catena.2018.03.003. URL https://www.sciencedirect.com/science/article/pii/S0341816218300791.
- Presenting logistic regression-based landslide susceptibility results. Eng. Geol., 244:14–24, 2018. ISSN 0013-7952. doi:https://doi.org/10.1016/j.enggeo.2018.07.019. URL https://www.sciencedirect.com/science/article/pii/S0013795218301212.
- KNN Model-Based Approach in Classification. In Robert Meersman, Zahir Tari, and Douglas C. Schmidt, editors, On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, Lecture Notes in Computer Science, pages 986–996, Berlin, Heidelberg, 2003. Springer. ISBN 9783540399643. doi:10.1007/978-3-540-39964-3_62.
- A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci., 55(1):119–139, 1997. ISSN 0022-0000. doi:https://doi.org/10.1006/jcss.1997.1504. URL https://www.sciencedirect.com/science/article/pii/S002200009791504X.
- Robert E. Schapire. Explaining AdaBoost. In Bernhard Schölkopf, Zhiyuan Luo, and Vladimir Vovk, editors, Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, pages 37–52. Springer, Berlin, Heidelberg, 2013. ISBN 9783642411366. doi:10.1007/978-3-642-41136-6_5. URL https://doi.org/10.1007/978-3-642-41136-6_5.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
- Improved sampling and feature selection to support extreme gradient boosting for pcos diagnosis. In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), pages 1046–1050, 2021. doi:10.1109/CCWC51732.2021.9375994.
- A unified approach to interpreting model predictions. Adv Neur In, 30:4765–4774, 2017.
- Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888, 2018.
- Christoph Molnar. Interpretable machine learning, Jun 2021. URL https://christophm.github.io/interpretable-ml-book/shapley.html#shapley.
- An investigation of the characteristics, causes, and consequences of june 13, 2017, landslides in rangamati district bangladesh. Geoenvironmental Disasters, 7(1):23, Aug 2020. ISSN 2197-8670. doi:10.1186/s40677-020-00161-z. URL https://doi.org/10.1186/s40677-020-00161-z.
- Hillslope form and process. 1972.
- Effect of landslide factor combinations on the prediction accuracy of landslide susceptibility maps in the blue nile gorge of central ethiopia. Geoenvironmental Disasters, 2(1):9, Mar 2015. ISSN 2197-8670. doi:10.1186/s40677-015-0016-7. URL https://doi.org/10.1186/s40677-015-0016-7.
- Gregory C Ohlmacher. Plan curvature and landslide probability in regions dominated by earth flows and earth slides. Eng. Geol., 91(2-4):117–134, 2007.
- Financial credit drives urban land-use change in the united states. Anthropocene, 21:42–51, 2018. ISSN 2213-3054. doi:https://doi.org/10.1016/j.ancene.2018.01.002. URL https://www.sciencedirect.com/science/article/pii/S2213305417300164.
- Analysis of land cover changes in the past and the future as contribution to landslide risk scenarios. Appl. Geogr., 53:11–19, 2014. ISSN 0143-6228. doi:https://doi.org/10.1016/j.apgeog.2014.05.020. URL https://www.sciencedirect.com/science/article/pii/S0143622814001155.
- Landslides and their contribution to land-cover change in the mountains of mexico and central america 1. Biotropica, 38(4):446–457, 2006.
- The influence of land use change on landslide susceptibility zonation: The briga catchment test site (messina, italy). Environ. Manage., 54(6):1372–1384, Dec 2014. ISSN 1432-1009. doi:10.1007/s00267-014-0357-0. URL https://doi.org/10.1007/s00267-014-0357-0.
- The influence of land use and land cover change on landslide susceptibility: a case study in zhushan town, xuan’en county (hubei, china). Nat Hazard Earth Sys, 19(10):2207–2228, 2019. doi:10.5194/nhess-19-2207-2019. URL https://nhess.copernicus.org/articles/19/2207/2019/.
- Optimization of causative factors for landslide susceptibility evaluation using remote sensing and gis data in parts of niigata, japan. PloS one, 10(7):e0133262, 2015.
- Chapter seventeen - geomorphological assessment of complex landslide systems using field reconnaissance and terrestrial laser scanning. In Mike J. Smith, Paolo Paron, and James S. Griffiths, editors, Geomorphological Mapping, volume 15 of Developments in Earth Surface Processes, pages 459–474. Elsevier, 2011. doi:https://doi.org/10.1016/B978-0-444-53446-0.00017-3. URL https://www.sciencedirect.com/science/article/pii/B9780444534460000173.
- Spatial distribution analysis and susceptibility mapping of landslides triggered before and after mw7. 8 gorkha earthquake along upper bhote koshi, nepal. Arabian J. Geosci., 10(13):1–24, 2017.
- Twi computation: a comparison of different open source giss. Open Geospatial Data, Software and Standards, 4(1):6, Jul 2019. ISSN 2363-7501. doi:10.1186/s40965-019-0066-y. URL https://doi.org/10.1186/s40965-019-0066-y.
- Feasibility study of land cover classification based on normalized difference vegetation index for landslide risk assessment. Geosciences, 6(4), 2016. ISSN 2076-3263. doi:10.3390/geosciences6040045. URL https://www.mdpi.com/2076-3263/6/4/45.
- Evaluating the susceptibility of landslide landforms in japan using slope stability analysis: a case study of the 2016 kumamoto earthquake. Landslides, 14(5):1793–1801, 2017.
- Characteristics of landslides in western colorado, usa. Landslides, 11(4):589–603, Aug 2014. ISSN 1612-5118. doi:10.1007/s10346-013-0412-6. URL https://doi.org/10.1007/s10346-013-0412-6.
- The application of gis-based logistic regression for landslide susceptibility mapping in the kakuda-yahiko mountains, central japan. Geomorphology, 65(1-2):15–31, 2005.
- Seda Çellek. Effect of the slope angle and its classification on landslide. Nat. Hazards Earth Syst. Sci. Discuss., pages 1–23, 2020.
- Applying deep learning and benchmark machine learning algorithms for landslide susceptibility modelling in rorachu river basin of sikkim himalaya, india. Geosci. Front., 12(5):101203, 2021. ISSN 1674-9871. doi:https://doi.org/10.1016/j.gsf.2021.101203. URL https://www.sciencedirect.com/science/article/pii/S1674987121000670.