2.1 Study Area
For this particular study, a rectangular area of 390.5 sq. km. (approximately 28 km x 14 km) covering Kure City in south Hiroshima was demarcated (Fig. 1). The cartographical edges that limit the study area are as follow: Top 34.325104 dd; Right: 132.800008 dd; Bottom: 34.197426 dd; Left: 132.500202 dd.
A geographically small port town adjacent to the Seto Inland Sea, Kure started as a shipbuilding facility in the end of 19th century. Soon made into a major dockyard and military base, the port and the city around it grew quickly due to the Imperial Japanese Navy’s and its facilities’ rapid development until the end of World War 2. However, the area’s flat terrains are cramped and limited by mountains (a common scenario in Japan), which forced the town’s rapid expansion into and near adjacent hills. As a result, a lot of the town’s habitations are located in rugged terrain. In fact, about 14% of the buildings framed in the study area are located in slopes with steepness above 20°, a situation susceptible for landslides. This risky scenario was explicated in the heavy damages suffered by the city of Kure in the July 2018 heavy rain disasters.
Topographically, the area is mostly comprised of rugged terrain: from the xx km2 of land in the study area, 191 km2 (about 49%) is composed by slopes above 20° steepness. The altitude varies between sea level around the dockyard areas and 839 m at the peak of Mt. Noro. Other predominant peaks in the area include Mt. Enofuji (664 m), Mt Yasumi (497 m) and Mt. Ege (568 m).
Geological mapping performed by Yamada et al. (1986) in the area shows that the region’s lithographical setting is composed majorly by igneous rocks of the Late Cretaceous, predominantly biotite granite and hornblende-biotite granite of the Hiroshima Granitic Rocks group, commonly referred to as “Hiroshima Granite”. In the study area, these rocks are distributed in most of the western portion of the map (Mt. Eboshiwa, Mt. Ege, Mt. Myojin, Mt. Tengujo, Mt. Hachimaki, Mt. Mitsuishi, Mt. Yasumi, Mt. Mitsumine and Mt. Hisago), as well as in a peninsula in the far east of the area (Mt. Mizunoura).
The other Late Cretaceous igneous rocks that occur secondarily but still dominantly in the area, particularly Kure City, are volcanic rocks from the Takada Rhyolites and Hikimi Group, namely dacite to rhyolite welded tuff, tuffaceous sandstone, mudstone conglomerate, found in the central-west (Mt. Ozumi Mt. Enofuji, Mt. Age and Mt. Tsuchi) and the far northeastern part of the area (Mt. Kanashioku, Mt. Mosuke, Mt. Nada), as well as rhyolite welded tuff with non-welded pyroclastic rock, dacite welded tuff, found in the central east part of the area (Mt. Noro).
The region is occasionally carved by granite porphyry and granophyre dikes of the late cretaceous (though posterior to the Hiroshima Granitic Rocks), cutting the granitic rocks in NE-SW orientation, and the granodioritic and dacitic rocks in NW-SE orientation.
The Quaternary deposits in the area, found along river vales, plains and deltas, comprise of sand, mud and gravel from the Saijo Formation, as well as gravel, sand and mud alluviums from the Holocene. Much of the port and dockyard terrain (sometimes directly adjacent to the granitic mountains) comprise of artificial reclamation grounds.
Regarding the structural geology arrangement of the study area, the regional structures are majorly dominated by NE-SW lineaments (especially near Mt. Noro) which might be related to the Triassic Maizuru Belt (Takemura et al., 2009). This orientation governs many morphological aspects of the area, as exemplified by Mt. Yasumi’s peninsula or Kurose River and its valley. Although it is not so common in a regional aspect, the study area around Kure City also presents an NNW-SSE fault line along the northwestern granites of the area, near Mt. Ege and Mt. Myojin.
The most predominant bedrock lithology of the area, coarse biotite granite/granodiorite and hornblende-biotite granite/granodirite of the Hiroshima Granitic Rocks group, or simply “Hiroshima Granites,” are very easily weathered, changing into a soil commonly referred to as Masado, or “real sand,” in Japanese (Wang et al., 2015). Masado granitic soil is known to have good permeability and be very brittle when wet, which causes it to be prone to lose its structure and stability when infiltrated, and thus be a very susceptible soil for landslide occurrence in heavy rainfall events.
Additionally, according to Chigira (2001), granites in mountainous areas are microscopically sheeted around depths of 50 m for undergoing the stress of the ridge morphology. According to the author, these micro-sheet configurations stimulates formation of cracks in the granite due to stress release, temperature change and water content near the surface, thus facilitating landslides activity in events of heavy rainfall. The Seto Inland Sea has little rainfall compared to the surrounding oceanic coastal areas in Japan, like the Sea of Japan and the Pacific Ocean. Although a fairly good part of the oceanic precipitation clouds are blocked either by the Chugoku mountains northward or the Shikoku mountains southward and the region is considered relatively dry (Kamada & Nakagoshi, 1996), heavy rainfall is particularly concentrated in mountainous areas. In Kure, the average annual average precipitation ranges from 1,000 to 1,600 mm, characterizing a relatively mild rainy zone. Mountain areas around the Seto Inland Sea, however, reaches annual average precipitation of 2000 mm to 3000 mm. The period of the year with the heaviest rainfall occurs between June and July every year, when the average monthly precipitation reaches 227 mm (Japanese Meteorological Agency, 2020).
2.2 Landslide Conditioning Factors (LCFs)
In both the FR and RF attempts of this study, seven slope failure causing factors were used for landslide susceptibility assessment:
2.2.1 Geology
Most of the physical properties of a rock, as well as its susceptibility to be weathered into soil, are determined by the rock’s lithological setting (Guzzetti et al., 1996; Atkison & Masari, 1998; Chigira, 2001; Dai et al., 2002; Çevik & Topal, 2003; Peart & Zhang, 2005; Highland & Bobrowsky, 2008; Yilmaz, 2009; and others). Namely in this study’s particular research area, granites are a lithology particularly prone for weathering when with medium to coarse granulometry (Wang et al., 2015), as well as prone for micro-sheeting in higher depths (Chigira, 2001), what makes this particular lithology a susceptibility factor for landslide occurrence. Guzzetti et al. (1996) also points out a correlation of slope failure events and the relative position of sedimentary and tectonic discontinuities, due to the settings between hard and soft rocks as well as the attitude of permeable and impermeable layers. In the particular case of this study area, sediments and deposits are mostly located in river basins, valley plains and deltas, landscapes which are inherently attributed to morphological areas of low to neutral slope degrees and slope effects. Therefore, these lithologies present low correlation values to slope failure occurrence probability.
On a controlled specific context, however, detailed and spatially localized weathering grading of rocks is a very spatially heterogeneous data, and demands for field work and measurements, impractical for bigger area analysis. This makes this kind of attribute, although very important in detailed physical analysis of particular localized events, unsuitable for mapping of bigger areas such as the one in question. Therefore, weathering level must be considered accordingly to the climatological context of the specific location of the research and extrapolated correspondingly to the geological features of the whole area.
2.2.2. Altitude
Although not directly related to the physical properties that influence landslide occurrence, the elevation attribute is extensively used as a relevant factor in landslide risk assessment (Yilmaz, 2009; Dai & Lee, 2002 and others).
It is judged that this is caused by the fact that contextual geomorphology and geology is frequently controlled by topographical elevation aspects. A common (though not universal) scenario is that where very low relative elevations represent gentle terrains covered with residual and thick soils, and very high elevations are composed of levelled mountain summits or cliffs of materials weathered down to the bedrock with high shear strength (Dai & Lee, 2002). In these circumstances, landslide probability indexes show a concentration of risk in areas of relatively intermediate topographical elevations.
Since the effect of elevation in geomorphology and geology (and consequently slope failure occurrence) is usually relative to a given contextual area in terms of regional geology, it is necessary to consider and adjust the factors according to the research area situation, rather than adopt global characteristics and values.
2.2.3. Slope angle
Slope angle (steepness, degree) has great influence in landslide susceptibility. In the hypothetical case of a uniform slope with isotropic material, it is determined that the failure probability increases directly with the slope gradient (Atkison & Masari, 1998). Consequently, slopes with relatively low angles have low to insignificant landslide probability levels. However, since the actual material stability varies along slopes according to geology and weathering level, it is frequent that slopes with higher gradients are, on the contrary, more resistant to slope failure (Dai & Lee, 2002).
In the context of the study area, these high degree slopes (usually above 50°) are usually composed by outcropping igneous bedrock sections, not heavily affected by weathering and thus resistant to slope failure.
2.2.4. Slope aspect
Although the direct relation of slope aspect (i.e., diving direction) with landslide probability is not fully understood by researchers (Carrara et al. 1991), this factor is commonly used by some investigators in statistically calculating landslide failure, since tendency patterns are usually found between failure occurrence and the direction of slopes. The two most commonly accepted hypothesis for the reason of slope aspect effect on landslide occurrence are that (a) the factor is related to the general physiographic trend of the area and that, (b) that depending on slope direction in relation to the sun’s trajectory in the sky, humidity of slopes may be more or less prone to evaporation, and (c) slopes in a specific direction are more or less affected by rainfall depending on the main precipitation direction, meteorologically speaking (Gokceoglu, 2002).
2.2.5. Drainage density
Drainage systems greatly influence the morphological aspects and consequently slope settings of a landscape. In fact, drainage-dominated terrains are frequently controlled and associate to increased landslide activity (Peart & Zhang, 2005).
One other aspect that greatly narrows the relationship between drainage systems and landslide influence is the fact that slope stability models are heavily affected by the subterranean water contents and characteristics, such as pore water pressure capacity. Since groundwater systems are generally associated to the drainage systems in the surface level, this evidences the effect of such aspect on slope stability characteristics.
In landslide risk assessment studies, the drainage system factor is usually considered in the form of relative distance to drainage channels (Yilmaz, 2009; Dai & Lee, 2002), or in drainage density values (Çevik & Topal, 2003), which is the one used in this attempt.
2.2.6. Land use
Although generally a natural process common in the inherent shaping of landscapes and slope formation, landslides are sometimes galvanized by human influences in the affected area (Highland & Bobrowsky, 2008). As previously demonstrated, drainage systems are important factors for slope stability influence. Therefore, one of the most common human causes for landslide activity are disturbing or changing drainage patterns, either for habitation ends or for agricultural use. Other than that, destabilizing slopes either by overloading the terrain with constructions or poorly planned re-landscaping for habitation ends greatly reduces slope stability. Finally, deforestation also increases landslide probability, given the importance of vegetation cover for slope stability as explained in the previous item.
It is important to note that, as explained in the “study area” topic, the inherently narrow and cramped setting of plain areas in the context of Japan frequently forces the population into the hills onto rugged terrain, either for habitation, agriculture or industrial ends. This means that the factor of land use in the landslide probability is particularly significant in the present location when compared to more spacious geographical contexts.
2.2.7. Distance to lineaments
Simply put, geological structures are strain marks or “wounds” of deformation caused by tectonic forces suffered by the rocks along its evolutional history (Fossen, 2016). These wounds, thoroughly explored by the field of structural geology, are of great interest not only for the formation history of the rocks, but also for practical contemporary ends such as economic exploration and geotechnics. In map view, such structures are usually represented in the form of lineaments.
Structural defects such as faults, folds, foliations and joints frequently represent internal weaknesses in the rock’s material cohesion, which of course translate into vulnerabilities of slope stability. Therefore, geological structures, in particular lineaments, are a relevant factor for landslide occurrence probability, commonly used in failure risk assessment (Guzzetti et al., 1996; Atkison & Masari, 1998; Lee & Talib, 2005; Ambrosi & Crosta 2006; Highland & Bobrowsky, 2008; Yilmaz, 2009; and others). Generally, the factor is weighed in GIS analysis as distance to lineament factor.
2.2.8. Rainfall
Break of slope stability caused by pore water density and saturation increase led by unusually heavy rainfall is generally considered the major trigger for most of landslide disasters (Guidicini & Isawa, 1977; Caine, 1980; Chigira, 2001; Guzzetti et al., 2007; Dahal & Hasegawa, 2008; Highland & Bobrowsky, 2008; Dahal, 2012; Wang et al., 2015; and others). As exemplified by the July 2018 Southwestern Japan disasters, landslides occurrence is commonly accompanied by flooding. However, the association of the two events are not limited simply by precipitation and saturation of ground by water. In fact, flooding may cause landslide activity by the cutting of slopes’ basal section around the banks of stream and rivers. Additionally, since debris flows and mudflows usually occur in small drainage systems, these are commonly mistaken for floods, when in fact the two processes might be happening simultaneously. Conversely, landslide debris may fall into rivers and stream, causing sudden water level change by backwater flooding, or, in more severe cases, by the collapsing of dams (Highland & Bobrowsky, 2008).
2.3. XRAIN Radar-acquired Rainfall Data
One of the landslide-causing factors commonly used in the production of landslide susceptibility maps is precipitation volume. However, most studies use rainfall data acquired through rain gauge measurement stations, which are only representative of the measurement location, and not detailed enough for spatial modelling problems. However, new strategies in precipitation measurement have been developed in a few countries, such as Japan. XRAIN (eXtended Radar Information Network) data is operated by the Ministry of Land, Infrastructure, Transport and Tourism (MEXT) and made available through University of Tokyo’s Data Integration & Analysis System (DIAS) platform. Starting operation in 2014, XRAIN comprises of a real time rainfall measuring system based on Multi-factor (MP) radars, which allow for more spatially accurate measurements of rainfall volume. In contrast to more conventional point-based or extrapolated rain gauge measurement station data (commonly used in landslide susceptibility mapping methods), radar-acquired data allows for bidimensional sampling of rainfall over an area, which make up for much more detailed localization aspects of rainfall data.
In this research, the research was downloaded as .zip packed .csv files spaced in 5-minute, 30-minute or 1-hour intervals. The .csv files comprise of tables with cells spatially organized so that each cell represents a 287 x 230 m pixel in a north-oriented grid representing the designated area, and each cell’s value expresses the rainfall intensity in mm/h at the time of measurement. For the study area, each of the files comprised of a 97 x 67 grid with 4999 pixels. XRAIN data collection for the area of Chugoku (where Kure is located) started at 2016, so the range of collected data spans from 2016 to 2019, summing up to more than 40,000 .csv files.
2.3 Landslide Susceptibility Map Production Methods
In this study, two different methods for production of the LSM were experimented, one being a statistical approach (frequency ratio) and the other a machine learning technique (random forest), in order to find which is most efficient for the intended objectives. Both LSMs contain 432,258 20-meters pixels.
In order to make a just comparison between the two methods, an identical collection of landslide conditioning factors (LCFs) were used in shapefile form for both attempts: (a) geology; (b) land use (c) altitude; (d) slope angle; (e) slope aspect; (f) drainage density; (g) distance from lineaments; and (h) mean annual precipitation. These LCFs are presented in map view in Fig. 2. The Digital Elevation Model used for confection of the LCFs as well as other LCF shapefiles such as drainage, land use and drainage maps were provided by the Geospatial Information Authority of Japan (2018). The lineament data was extracted from the Kure Geological map by Higashimoto et al. (1985). The mean annual precipitation factor comprises of recorded rainfall with XRAIN technology. A summary for all the data used in the analysis is shown in Table 1.
Table 1
– Summary of data used in this research’s analysis.
Data | Format | Source |
Landslides from July 2018 disasters | Point-type shapefile | Geospatial Information Authority of Japan (GSI) (2018) |
Land use | Polygon-type shapefile | NLID, MLIT of Japan (2022) |
Geology | Polygon-type shapefile | Higashimoto et al. (1985): Kure 1:50,000 geological map |
Altitude | 20 m Raster | DEM by GSI (2022) |
Slope angle | 20 m Raster | (Extracted from) DEM by GSI (2022) |
Slope aspect | 20 m Raster | (Extracted from) DEM by GSI (2022) |
Drainage density | Polygon-type shapefile | (Extracted from) drainage line data by National Land Information Division (NLID), MLIT (Ministry of Land, Infrastructure, Transport and Tourism) of Japan (2022) |
Distance from lineaments | Polygon-type shapefile | NLID, MLIT of Japan (2022) |
Mean annual precipitation (2016–2019) | Polygon-type shapefile | XRAIN radar-acquired data, Ministry of Land, Infrastructure, Transport and Tourism (MEXT), University of Tokyo’s Data Integration & Analysis System (DIAS) (2018). |
The collection of landslides used as output data in both methods comprises of 1177 landslide points referent to the July 2018 disasters, which were mapped and provided by the Geospatial Information Authority of Japan (2018) with aerial photography. The landslide points were randomly separated into training and test sets, with a ratio of 30% for the training set (353 landslide points) and 70% for the test set (824 landslide points). The 1177 landslide points separated into training and test sets distributed in the study area are shown in Fig. 1.
2.4.1. The frequency ratio (FR) susceptibility assessment method
The frequency ratio method uses the assumption that landslide occurrence is determined by factors that are related to the event, and thus that new landslides will generally occur under the same conditions of past landslides (Lee & Talib, 2005; Yilmaz, 2009; Rasyid et al., 2016 and others). In the frequency ratio method, FR values represent the ratio between the landslide occurrence and the area of specified factor for a given factor class. In order to acquire values representative for frequency ratio analysis, the landslide occurrence and factor area values are normalized by calculating the percentage of the specific class to the total factor or landslide of the analysis (Yilmaz, 2009). For example, in a study area with 100 km2 and 20 landslides, a factor class of 10 km2 and 5 landslides represents 10% of the analyzed area and 25% of the landslides, resulting in a value of 2.5 for that class. In frequency ratio calculation, 1 is considered the medium value, and thus values larger than 1 represent a higher correlation of landslide occurrence with the given attribute and consequently a higher risk (Lee and Pradhan, 2006). In some FR method calculations, such as the one conducted by Yan et al. (2018), FR values are normalized according to the other classes FR values in that factor for the sake of smooth analysis process. After the frequency ratio is calculated for each range or type of factors, the values are summed in each pixel of the susceptibility map to calculate the landslide susceptibility index (LSI) for that specific pixel (Lee and Talib, 2005; Yilmaz, 2009):
Once the LSI values are established for each pixel of the map, a final landslide susceptibility map is produced, where higher LSI values represent higher risk of landslide occurrence in that location.
In this study, the platforms used for FR calculation were ArcGIS Pro (for data manipulation in map view) and Microsoft Excel (for calculations and data viewing in graphs). Each one of the landslide-related factors were reclassified in 10 classes (for the quantitative factors) or used in the original classification (for qualitative factors). The number of landslides for each class was found using the spatial join analysis tool in ArcGIS, and the class area was calculated using the calculate geometry tool. With the variables necessary for FR calculation attained, the FR value for each factor class was calculated in field calculator. Finally, the LSI value for each 20m pixel of the study area was obtained by summing the respective FR values for the specific blending of factor classes of that pixel, resulting in the FR-based landslide susceptibility map.
2.4.2. The machine learning random forest (RF) method
Machine learning (ML) is a form of artificial intelligence comprising of methods where a system “learns” based on a set of data, looking for patterns in it and how they affect a certain result relative to a problem. It was found that this technology is successful in completing varied specialized tasks, when set up with a sufficient dataset and adequate parameters. The past decade has seen a substantial rise of ML techniques in various areas of application, such as autonomous driving, health care, finance, manufacturing, energy (Carleo et al., 2019). The utilization of ML methods in the domain of natural hazards (including landslide susceptibility assessment) is supported by Goetz et al. (2015), Youssef & Pourghasemi (2021). Some of the advantages compassed by the use of MLs include adjusting its internal structure to the experimented data, the ability to extract knowledge automatically from a dataset and to build classification and regression to provide accurate models, as well as efficiency and practicability even in large areas (Felicísimo et al., 2013, Kavzoglu et al., 2019, Youssef & Pourghasemi, 2021).
ML includes various specific techniques with varied degrees of complexity. Although all these techniques employ the common concept in machine learning of identifying patterns in data and constructing a prediction model, the specific mechanisms and algorithms used differ. Bibliographical research suggests that the ML technique judged most effective for the specific case of landslide risk assessment mapping is the random forest (RF) technique (Youssef & Pourghasemi, 2021, Yilmaz et al. 2009).
The RF technique is actually an expansion of another ML method, the decision tree. The decision tree technique is a supervised machine learning method, that is, a set of inputs (samples) and outputs (labels) are given so that the algorithm may figure out a function (model) to find outputs for “unlabeled” inputs. For this specific method, the machine observes a provided dataset and creates a “tree” (e.g., prediction model) that best represents it. In the “tree,” every time the data splits according to a variance in a parameter, a new “branch” cleaves out of the tree, in what is called a decision node. The splitting goes along for every other parameter variety, creating new “branches” in decision nodes, until it finally ends in a “leaf,” representing a predicted outcome for that particular set of parameters in the data. Due to their simplicity and intelligibility, decision trees are a popular ML technique (Wu et al., 2008).
However, decision trees learners have the limitation of being prone to creating over-complex trees, which do not reflect an accurate representation of the data in a generalized way, which is known as overfitting. This was solved with the random forest ML technique, which, as the name suggests, is an ensemble model comprising of a “forest” with many decision trees. Each tree is a completely random and independent experiment, which prevents overfitting by outputting a result comprising of an ensemble averaged prediction for all the decision trees in the random forest, avoiding the result given by a singular (possibly unrepresentative) decision tree. Data used in the RF learning method has few necessities of transformation, as the algorithm deals with outliers and missing values (Youssef & Pourghasemi, 2021).
In this study, the RF algorithm utilized was the one provided in the scikit-learn ML library (sklearn.ensemble.RandomForestClassifier), based on the Jupyter Notebook platform with Python 3.9 programming language. In order to find the best parameters and provide maximum efficiency, the code was programmed so that, for a given parameter value, the algorithm would be executed 10 times using the provided training landslide points and factors dataset (respective to Kure City and the July 2018 landslides), automatically calculating the resulting landslide susceptibility map accuracy compared to testing set using ROC AUC analysis for each run. Then, the average AUC score for that parameter was saved, and the parameter value was incremented, repeating the process until a maximum parameter value was reached, and finally the parameter value which provided the best AUC score was adopted for the final map. Once the optimal parameter values were designated, a final prediction model was executed and the resulting predictions were then laid in map form using ArcGIS Pro, providing the RF-based landslide susceptibility map.
2.4 Performance assessment
In order to assess the performance of the produced LSMs, the receiver operating characteristic (ROC) analysis was employed in this study. Initially developed for radar accuracy tests, the ROC method is recommended for landslide zoning validation tasks due to its threshold-independent nature (Beguería, 2006, Corominas et al., 2014, Vakshoori & Zare, 2018), that is, it doesn’t require a fixed value to determine that either negates or requires a landslide activation, since LSI is a probability assessment, not a deterministic one. Thus, ROC analysis uses multiple thresholds with different interspaces, and the points in the ROC curve represent these possible cutoff thresholds given by the multiple cases in a model (i.e., LSM). Each threshold forms a confusion matrix with four types of outcomes for each pixel: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). Each threshold point in the curve then corresponds to a pair of sensitivity and specificity values, where sensitivity is represented by the y axis and represents the true positive rate (TPR) and the specificity is shown in the x axis, representing the false positive rate (FPR). TPR and FPR are calculated as follows:
\(TPR=\frac{TP}{TP+FN}\) | (2) |
\(FPR=\frac{FP}{TN+FP}\) | (3) |
After the ROC curve is plotted, its area under curve (AUC) may be calculated derived from the constructed ROC curve as follows: |
$$AUC={\sum }_{i=1}^{n+1}\frac{1}{2}\sqrt{{({x}_{i}-{x}_{i+1})}^{2}}\times ({y}_{i}+{y}_{i+1})$$
4
The AUC is used as a metric to assess the quality of the LSM, where a larger area (ranging from 0.5 to 1) represents better prediction performance, that is, how well the model separates the validations landslides throughout the susceptibility zones of the LSM. For that reason, AUC value is used as the primary meter for LSM accuracy in this study (Beguería, 2006; Corominas et al., 2014; Vakshoori & Zare, 2018).
As an additional method for LSM validation, landslide density is checked for each one of the 5 LSI zones attributed in the LSM. It expected that in an efficient LSM, the landslide density distribution will follow a proportionally direct growth with each zone change, from very low to very high.