This paper aims to compare and develop the influence on different sample sizes and sample ratios when using machine learning (ML) models, i.e., support vector machine (SVM) and artificial neural network (ANN), to produce landslide susceptibility maps (LSMs) in Penang Island, Malaysia. At the same time, traditional statistical (TS) models are also considered to produce LSMs in this comparative research. The receiver operating characteristic (ROC) curve and recall metric are applied to evaluate the model’s performance. Based on the evaluation criteria, the ML model outperforms the TS models and the ML models trained using the datasets with larger sample size give a better performance. ML models, especially SVM models, have better performance when training with balanced datasets as well as the datasets of more landslide sample data. Kruskal-Wallis test and Mann-Whitney U test are applied to test the significance. The results indicate that sample size and sample ratio are essential factors when considering ML models to produce LSMs. The LSMs produced in this research can provide valid and useful information to the local authorities for landslide mitigation and prediction.
Dr. Fam Pei Shan is a Senior Lecturer at the School of Mathematical Sciences, Universiti Sains Malaysia. Currently, she is Program Chairperson of the Bachelor of Applied Science (Applied Statistics) in the school. She obtained her Bachelor, Master and Ph.D. degree in Statistics from University of Malaya, Malaysia. Her research interests include categorical data analysis, regression analysis and reliability analysis. She is Editorial Board Member of Journal of Statistical modelling & Analytics (JOSMA). She is active in landslide prediction study. She has authored several journal articles on slope failure analysis.