Assessment of Flood Occurrence Potential using Data Mining Models of Support Vector Machine, Chaid and Random Forest (Case study: Frizi watershed)

Zarei, Mahdi; Zandi, Rahman; Naemitabar, Mahnaz

doi:10.52547/jwmr.13.25.133

Volume 13, Issue 25 (5-2022) jwmr 2022, 13(25): 133-144 | Back to browse issues page

‎ 10.52547/jwmr.13.25.133

Mendeley

Zotero

RefWorks

Zarei M, Zandi R, Naemitabar M. (2022). Assessment of Flood Occurrence Potential using Data Mining Models of Support Vector Machine, Chaid and Random Forest (Case study: Frizi watershed). jwmr. 13(25), 133-144. doi:10.52547/jwmr.13.25.133
URL: http://jwmr.sanru.ac.ir/article-1-1140-en.html

Assessment of Flood Occurrence Potential using Data Mining Models of Support Vector Machine, Chaid and Random Forest (Case study: Frizi watershed)

Mahdi Zarei

, Rahman Zandi

, Mahnaz Naemitabar

Hakim Sabzevari University

Abstract: (2187 Views)

Introduction and Objective: Flood, like other hydrological phenomena, is an uncertain phenomenon that can occur at any time and place and is influenced by various climatic factors, physical characteristics of the basin, vegetation status and land use, and human intervention. Determining the contribution of each parameter to the flood incidence is important. At present, with the development of GIS, remote sensing (RS), and machine learning (ML) methods, very accurate modeling of flood probability can be performed. However, the construction of these models requires accurate and principled knowledge of the flood occurrence process, the study of effective parameters in flood formation, understanding of how each parameter affects flood generation, and the selection and development of appropriate models and their evaluation. Due to the importance of determining flood-prone areas in different areas, especially basins located in arid and semi-arid areas such as the study area, the present study was conducted to assess flood risk using vector mining data mining models. Random support, grass, and forest are targeted in this area.
Material and Methods: In the present study, to support the risk of flooding, data support models of support vector machine, field, and random forest have been used. In general, the purpose of presenting data mining models is to achieve a reasonable and accurate estimate of spatial prediction of flood occurrence, compare the efficiency of the models and select the most appropriate method for preparing a flood sensitivity assessment map. In this study, from various information such as the topographic map of scale 1: 50000 to extract level lines, a geological map of scale 1: 100000, a soil map prepared by the General Department of Natural Resources and Watershed Management of Khorasan Razavi province, digital elevation model (DEM) image with Spatial resolution of 12.5 m, Google Earth satellite imagery, meteorological data, rain gauge, statistical period of 20 years (98-78), Andarkh stations, Olang Asadi, Kardeh Dam, Marshak, Bulgur, Bala Gosh, Al, Chenaran, Moghan, Chekneh Olya, Abqad Frizi, Talgur, Qadirabad, and Kabkan have been used. Elevation, slope, slope direction, drainage networks, main waterways, and convexity of the ground surface were extracted from the DEM image and level lines. Land use of the region was prepared from Google Earth satellite images related to 2020 and in a supervised classification method. The vegetation map of the region was also prepared based on the NDVI index and from satellite images of Landsat 8 in 2018.
Results: The elevation factor plays a key role in controlling the direction of flood movement and water surface depth. At an altitude of 2000 m and more, with increasing altitude, the flood potential in the study area increases. According to the results, among the uses of the studied basin, irrigated and garden lands produce less runoff due to more infiltration and are less prone to flooding. In the study area, at a slope of 60 degrees, due to the increase in slope, the latency of the basin is low, the amount of water infiltration into the soil is low, and as a result, the volume of floods and surface runoff will increase. Class 0/0074-0/0120 has the greatest impact on the occurrence of basin floods. The northern, northwestern, and western slopes have the potential for flooding due to heavy rainfall, long-term snow retention, and moisture. In the study area, more than 250 mm of rainfall has the greatest impact on the occurrence of floods. In the study area, due to the relatively low permeability of the soil, the soil produces more runoff and floods. Based on the results of topographic moisture index classes in the study area, classes 268/38-359/99 had a great impact on the occurrence of floods. In the study area, concave areas have a great impact on floods because the most important and effective factors in the occurrence of floods are the slope and curvature of the earth. At present, the predictability of flood sensitivity in the study area was investigated using the area under the curve. The results of this study show that in the linear support vector machine model with the best scenario M3 with the highest correlation coefficient of 0.972 and the lowest value of MAE = 0.538, and in the random forest model the best scenario M10 with the highest correlation coefficient 961 0.0 and the lowest error value MAE = 0.685, in the Chaid decision tree model, the best scenario was M8 with the highest correlation coefficient of 0.954 and the lowest error value was MAE = 0.723.
Conclusion: In general, according to the results of the present study, the floors with low and medium flooding potential are more located in the eastern and southern parts of the basin, so in the eastern part of the basin due to low slope and good permeability flood risk, It is average. According to the results, due to the existence of poor rangeland land uses in the western and northwestern half of the basin, the highest flood potential has been observed. The results also showed that the northern and western parts of the basin, which in terms of geology and lithology have surface formations such as marl, clay, and silt, and their permeability coefficient is very low and vegetation is low, have a high potential for occurrence. They have flooded. In this study, the models were evaluated using correlation coefficient (R) and mean absolute error values (MAE). Examination of the results of the models showed that the support vector machine, chad, and random forest models with scenarios M3, M8, and M10 with the highest correlation and the lowest mean error, respectively, have high accuracy in estimating the risk of floods in the study area. In addition, the area under the curve (ROC) was used to evaluate the proposed models. Accordingly, these values have more accurate results in both educational data and training data in the algorithm (SVM) and the new random forest algorithm model. This result indicates that both models have been validated in terms of modeling accuracy and validity.

Keywords: Friesian watershed, Flood, Data mining models, ROC curve, Zoning

Full-Text [PDF 2772 kb] (758 Downloads)

Type of Study: Research | Subject: بلايای طبيعی (سيل، خشکسالی و حرکت های توده ای)
Received: 2021/02/24 | Revised: 2022/06/29 | Accepted: 2021/05/3 | Published: 2022/06/29