Extraction of terrain ridge lines and valley lines based on agglomeration analysis of terrain points: a cluster analysis method

Terrain feature extraction is one of the critical issues in geographic information science. As important terrain feature lines, ridge lines and valley lines play an important role in hydrological analysis, terrain reconstruction and automatic integration of contour lines. But the extraction of terrain feature lines is complicated and time-consuming task. In this paper, a terrain feature line extraction method is proposed based on clustering technique. The terrain feature points are automatically extracted according to the agglomeration of terrain points, and the similar points are automatically identified according to the DBSCAN clustering algorithm. The points with high similarity are clustered along the direction of ridge or valley, and the whole terrain will be clustered into multiple sub-regions. The nearest sub-regions are found by calculating the minimum distance between these sub-regions, the adjacent sub-regions are connected orderly by their center line to obtain terrain feature lines. Compared with other methods, the cluster analysis method in this paper has simple process and high efficiency.


Introduction
Terrain feature extraction (Hu 2017) is an important research topic in GIS (Geographical Information Science). Ridge lines and valley lines, as important terrain feature lines, play an important role in the simplification of topographic model, sample-based topographic generalization, topographic and geomorphic research, hydrological analysis, terrain reconstruction, contour automatic synthesis, and so on. But the extraction of terrain feature lines is complicated and timeconsuming task, how to reduce the computing complication and increase accuracy will be a significant research problem.
There are three types of feature line extraction methods based on DEM (Digital Elevation Model) (Mukherjee et al. 2013): methods based on image processing technology to detect curve structures, algorithms based on terrain surface flow simulation, and algorithms based on surface geometry analysis. The method based on image processing techniques (Peucker and Douglas 1975;Pang et al. 2013) applied the method of curve structure detection in images to DEM feature line detection, but it was sensitive to noise data, and branch connections of ridges and valleys without obvious linear features were prone to fracture. A term 'structualist' is used to develop ridge and valley line extraction from digital images based on lines drawn by moving under logical constraints in the image, starting from previously selected points (Riazanoff et al. 1998). A mathematical framework based on grey level morphological transformation is developed to extract the ridge and valley connectivity networks to understand spatial distribution (Sagar et al. 2003).
The algorithm based on the physical simulation of surface water flow (O'Callaghan and Mark 1984) determines the water flow direction and water catchment amount at each point by simulating the surface water flow process, then connects the feature points with water catchment amount is greater than the threshold to form a valley line, and takes the boundary of the catchment area as the boundary of the watershed area. The-over-reality of the feature line extracted by this algorithm is a suitable method at present. However, the extracted ridge line is a closed curve, which is not in line with reality, and it is difficult to judge the direction of water flow in a flat area. The feature line extraction based on topographic surface flow analysis is more accurate comparing with other two methods. It is difficult to connect feature lines in geometric form analysis, and terrain points are often omitted when extracting feature points by surface flow simulation method. Ai (Ai and Li 2010) develops a structured analytical approach to generalize DEM data by identifying small valleys and filling in corresponding depression locations. But the water system pattern and distance between adjacent valleys are not considered. According to their hydrological significance, unimportant river valley branches are detected and their cover is filled by raising the terrain, making the terrain surface smoother, this method can effectively preserve the main geographical features of the terrain, but they only used valley cover area as a decision-making factor, and also considered valley length, layer order, and valley density, the water system pattern and distance between adjacent valleys are not considered. Extracting topographic feature lines from point cloud is proposed based on SSV (signed surface variation) and HC-Laplacian smoothing method (Zhou et al. 2016), in which the potential feature points are segmented into different clusters by region growing based on the Euclidean distance and SSV. Some automated algorithms and software of the extraction of ridges or ridge axes from DEMs are still not practical for the applications. Most researchers tend to design specific lineament extraction algorithms without a generic solution. The main problem is that the process of axis thinning and segment connection are still too complicated to have a universe solution (Chang and Sinha 2007). Due to DEM solution and production errors many of problems for automated feature recognition algorithms are minimized by manual feature detection to overrule local inconsistencies for preserving the global trend to avoid false truncations and fragmented lineaments. For extracting ridge and valley profiles of mountains from amorphous point cloud data, a projection-based spatial morphological extraction framework is proposed for detecting mountain profile (Maurya et al. 2018). They consider that the membership, neighborhood, and cohesion between points are fundamental problems and should be examined to extract ridge and valley profiles from the point cloud surface data.
The traditional method based on image technology is easy to be affected by noise data, the lines without obvious features will be broken. The feature line extracted by the method based on surface water flow is a closed curve, which is not in line with the reality. The method based on surface geometry analysis is easily affected by the size of the window, and it needs several iterations for small undulating terrain. To overcome the above problems, this paper presents a feature line extraction method based on clustering analysis. The idea of this method is based on the spatial clustering characteristics of terrain points with ridges and valleys. The method in this paper is simple, easy to operate, and has a fast extraction speed. Through the clustering of the original terrain data, the abscissa, ordinate and elevation of each terrain point are taken as its three-dimensional space, the terrain coordinate points are clustered and analyzed. According to the clustering analysis and evaluation results, the clusters close to each other are connected to form a feature line. Finally, the elevations of classes in the feature lines are compared to extract ridge and valley lines. The experimental results show that the accuracy of our method can reach the requirements of terrain feature line extraction and computational efficiency is satisfactory.
The article is organized as follows. We introduce the ideas and principles of terrain feature line extraction, and the content related to clustering algorithms in "The idea and principle of terrain feature line extraction". The "The DBSCAN clustering algorithm" introduces the concept and process of DBSCAN clustering algorithm. In "Topographic feature line extraction method", we introduce the process of extracting terrain feature lines and the parameter determination idea of DBSCAN algorithm. In "Topographic feature line extraction process", the topographic feature line extraction method proposed in this paper is verified by experiments. In "Parameter determination of the DBSCAN algorithm", we summarize the article and give the conclusion.

The idea and principle of terrain feature line extraction
Common topographical feature lines (Tang et al. 2003) include valley lines, ridge lines, and along gullies, topographic feature points are mountain apex and saddle points. The ridgeline and valley line (Riazanoff et al. 1988) are two corresponding characteristic lines, the ridgeline is laid along the ridge trend, and the contour line appears as a curve convex to the low, which is roughly laid along the watershed with a certain degree of water separation. The valley line is a route along a narrow and low depression between two mountains, which is the connection of the lowest point of the valley and has a certain catchment property. The line along the ditch (Zhang et al. 2012) is an important geomorphological feature of the loess hilly area, it is the loess morphological characteristic line with obvious change in slope. The mountain apex and saddle point are the terrain control points with constraint relationship (Wu et al. 2018), the mountain apex is the elevation maximum point in the local area, with the mountain apex as the center, the elevation of the surrounding terrain points gradually decreases to the edge within a certain range. The saddle point (Fisher et al. 2004) is a minimal point of elevation in a local area, the elevation difference between it and the adjacent vertex is often used as an important indicator to judge the upward undulation of the positive terrain of the ground. The physical properties of the terrain such as friction and stiffness (Dong et al. 2020) are pre-estimated for mobile robots in motion control, dynamics parameter adjustment, trajectory planning, etc. The terrain feature line that is not different from the physical property is a kind of geometry shape on the surface of terrain to describe the change of terrain surface.
The topographic ridge lines and valley lines (Chang et al. 1998;Masavi et al. 1999) are curves connected by a group of continuous ridge points or valley points, it represents a topographic boundary that intersects between slopes, which can describe the characteristics of topographic fluctuation change and pattern distribution, it is of great importance for the analysis of topographic form and trend. On the other hand, the topographic points on the slopes on both sides of a topographic feature line have local agglomeration. That is, the topographic points in the vicinity of topographic ridge points or valley points are relatively near in spatial distance, and they can form a cluster close to the ridge points or valley points, with the ridge points or valley points serving as the cluster's center. Of course, topographic feature points like peak points and depression sites have similar qualities, but they're generally isolated and can't be used to construct terrain feature lines.
Cluster analysis (Blashfield and Aldenderfer 1978;Busu et al. 2021) divides a given data set into several subsets, each subset is called a cluster, so that the distance between data objects or data points in the same cluster being extremely close or the similarity being extremely high, while the distance between objects in different clusters being extremely far or the similarity being extremely low. Three-dimensional coordinates (x, y, z) can be used to represent each terrain point, where x and y are ground coordinate points and z represents the point's elevation. When the terrain is represented by a DEM dataset of regular grids, the terrain points can be represented by grid points in terrain data set. The spatial distance between terrain points is depicted as a threedimensional Euclidean geometric distance, the distance between two points is used to evaluate their similarity. Set a terrain point as a neighborhood, and calculate the agglomeration of points in the neighborhood to judge whether they will form a cluster, or by dividing the entire terrain point according to distance, until it cannot be split, multiple clusters can be obtained. we can gain terrain ridge lines or valley lines through first finding the extreme point of each cluster, then calculate the point with the farthest distance from the extreme point on both sides of the cluster and form the center line of the geometric shape of each cluster, finally we connect these center lines by head-to-tail ligation.
Cluster analysis is a popular data mining technique. Its most notable characteristic is that the dataset does not require prior information (that is, the category identifier of the object). It requires directly starting from the dataset itself and assigning a category label to each object in the dataset according to some similarity criterion. As a result, cluster analysis methods are also known as unsupervised classification methods. Cluster analysis is critical in assisting individuals in obtaining potential and valuable information while filtering out irrelevant data.
For DEM terrain data analysis on a regular grid, the similarity between any two terrain points can be expressed by their Euclidean distance.
Cluster analysis is not only related to the dataset, but also to the similarity measure it chooses, the choice of the similarity measure is still a challenging problem to this day. The shape of the cluster obtained by the cluster analysis is an important reference for the selection of the clustering algorithm. Usually, the clusters have two types: spherical (convex) and non-spherical. The clusters generated by the distance function are generally spherical. However, the shapes of clusters are various, especially when the shapes of clusters are irregular, it is necessary to use density-based cluster analysis.
Density-based clusters consist of relatively dense regions between objects surrounded by regions of low density. This is generally achieved by specifying the minimum number of points around any object in the cluster. When clusters are irregular in shape or coil around each other, there are noise points and outliers, density-based cluster definitions often yield better clustering results.
The idea of density-based clustering is that, as long as the density of objects in a region is greater than a certain threshold, they will be added to the clusters that are close to it. The algorithm can overcome the shortcoming that distance-based algorithms can only find "circular-like" clusters, then it can find clusters of any shape and is not sensitive to noise data. The DBSCAN (density-based spatial clustering of applications with noise) is a representative density-based clustering algorithm, it is a partial clustering algorithm, that is, the union of all clusters obtained by clustering cannot cover the dataset. Its characteristic is that it defines a cluster as the largest set of all density-connected symmetrical structures. That means it can cluster the regions with high enough density into a cluster and find clusters of arbitrary shapes in noisy datasets.
The shape of the clusters formed by terrain points is irregular and has various shapes, so the density-based clustering algorithm is more suitable for the clustering analysis for DEM-based terrain data. In this paper, the DBSCANbased clustering algorithm is used to extract terrain feature points, then ridge lines and valley lines are inferred from the feature points.

3
The DBSCAN clustering algorithm DBSCAN clustering (Schubert et al. 2017;Khan et al. 2014) is a local clustering method. The union of all clusters cannot completely cover the dataset itself. The noisy data in the dataset is excluded by the algorithm and cannot form a cluster or join other clusters. The idea of the DBSCAN algorithm is to form a cluster of all objects connected by density. Before the operation, two parameters need to be input: neighborhood radius and minimum points (Minpts). A cluster can be formed only when the number of points in the neighborhood exceeds the minimum number of samples.
Given a dataset S and a neighborhood radius ε, for any data object X ∈ S,ε(X) = {Y|Y ∈ S, d(Y, X) ≤ ε} , Y is in the neighborhood with X as the core and ε as the radius, ε(X) is called the ε-neighborhood of X in S. Here d (Y, X) represents the distance between X and Y.
Given parameters (ε, Minpts) and an arbitrary data object X ∈ S, if |ε (X)|> = Minpts, then X is said to be a core point of S about (ε, Minpts).
Any X, Y ∈ S, if X is a core point in the data set S, Y ∈ ε(X), Y is in the neighborhood of X, then Y is directly density-reachable from X with respect to (ε, Minpts).
If there are data objects X and Y in S, there are data columns X 1 , X 2 , X 3 , X 4 , where X = X 1 , and from X 1 to X 2 , X 2 to X 3 , X 3 to X 4 , X 4 to Y are directly density reachable, then X to Y are density reachable with respect to (ε, Minpts).
For any data object Z, Y ∈ S, if there is a data object X ∈ S, so that both X and Z, Y is density-reachable, then Z and Y are density-connected with respect to (ε, Minpts).
For any Y ∈ S, if Y is not a core point about (ε, Minpts), but there is a core point X, let Y ∈ ε(X), Y is said to be the boundary point of S.
For any Y ∈ S, if Y is neither about (ε, Minpts) core points nor boundary points, then Y is the noise point of S about (ε, Minpts).
The basic idea of the DBSCAN algorithm is to find a cluster generated through examining the ε-neighborhood of each point X in the dataset S. For each X ∈ S, if ε(X) > = Minpts, create a new cluster with X as the core point and merge all objects density-reachable from X into this new cluster until there are no new ones. When a point can be added to this cluster, the next point in S is checked until every point in S has been checked. In this process, if some points checked later already belong to a previously generated cluster, there is no need to generate a new cluster for this point.
The process of the DBSCAN algorithm is to search for clusters by checking the neighborhood of each point in the dataset. During the operation, all data needs to be traversed. First, each point in the dataset is marked as unvisited, when a point is visited, it is marked as visited, then how many points are calculated in its neighborhood according to the spatial distance calculation formula (1). If the number of points is greater than the minimum number of sample points, the point is a core point. If it is not the core point, the algorithm continues to traverse the next point unvisited in the dataset. If it is a core point, the points in its neighborhood are classified and a cluster with the core point is created as the core region, and this cluster is classified as a new data gathered region. The points in its neighborhood are accessed until all points are visited.

Topographic feature line extraction process
To extract ridge lines and valley lines based on the DBSCAN algorithm, it is necessary to rasterize the terrain data and convert it into text data that can be read by the algorithm. We select the two parameters of the DBSCAN algorithm through a certain strategy for clustering analysis and study the terrain features according to the clustering results furtherly.
The specific steps of terrain feature line extraction are as follows: Step1: Terrain data preparation.
The DEM data for the terrain is obtained, the raster data is converted into text format and read by the algorithm through a series of processing steps in ArcGIS software.
Step 2: Terrain data cluster analysis.
With the help of the clustering algorithm, the terrain is divided into multiple separated sub-regions SV i (i = 1, 2, ⋅ ⋅ ⋅, K) , K is the number of subregions.
The DBSCAN clustering algorithm is used for terrain clustering division, and its similarity is expressed by Euclidean distance. The judgment rule is: when d ≤ ε (ε is the set threshold), and P i is a point in subregion V i , P j does not belong to the point in subregion V i , V i = V i ∪ {P j }, then d is the distance between two points.
The DBSCAN clustering algorithm needs to input two parameters, the minimum number of sample points (represented by Minpts) and the neighborhood radius (represented by ε), which can be determined according to the formation principle of terrain feature lines and the method of traversal analysis. The details are introduced in 'Parameter determination of the DBSCAN algorithm'.
Step 3: Extraction of ridge points and valley points.
The terrain feature points are determined by calculating the extreme points of the subarea. At this time, two points will be selected in the subarea, that is the maximum value point and the minimum value point. The point closest to the center of the sub-area is retained, and the other point is deleted, then take the reserved extreme points as the feature points of this subregion.
The calculated feature points are the ridge points and valley points in the terrain, the ridge point is the feature point of the ridge line, which is at a high altitude, and the valley point is the feature point of the valley line, which is at a low altitude.
Step 4: Feature line extraction to determine ridge lines or valley lines.
Usually the subarea is distributed in a certain direction. we calculate the points on both sides of the subarea that are farthest from the extreme point, then draw the center line of the subarea, and pass through these three points. The simulated line can reflect the characteristics of the sub-area line distribution.
We can calculate the minimum distance between each subregion and find adjacent subregions. The minimum distance between subregions is the smallest of any two grid distances in the two subregions. Then connecting the two points with the minimum distance between adjacent subregions, and the connecting line can represent the feature line between the sub-regions. By Combining the connecting lines within the subregions and between the sub-regions, a completed feature line will be formed.
The minimum distance between subregions can be expressed as formula (1). A i and A j are two sub-regions, the distance between any two grid points, X ∈ A i , Y ∈ A j , is d(x, y). The smallest distance between any two grid points is defined as the distance measured between the two subregions,

Parameter determination of the DBSCAN algorithm
In the DBSCAN algorithm, whether or not the object in the dataset S is a core point depends on the density parameter (ε, Minpts) composed of the neighborhood radius and the minimum number of samples. The DBSCAN algorithm divides the regions that reach or exceed the density generation into clusters, it can find the clusters of arbitrary shape in the data set with "noisy", then form a partial cluster of It is worth noting that the DBSCAN algorithm is sensitive to the user-defined density of (ε, Minpts). That means, the settings of the parameters of ε and Minpts will directly affect the clustering results.
For regular grid-based DEM terrain datasets, Minpts represents the minimum number of grid cells within the neighborhood radius that form a core point. Here we set the minimum number of grid cells to form a core point to 2, that is to say, if there are 2 sample points in the neighborhood, a core point will be formed.
After the Minpts are determined, the value of the neighborhood radius ε can be determined by the method of dynamic traversal. First, the initial value of ε is determined, the lower limit of ε, which is calculated as MinEPS. Assuming that the resolution of the terrain is R × R, the initial value of ε is determined to be R meters, then MinEPS = R. Next, the upper limit of the neighborhood radius ε needs to be determined. In the DBSCAN algorithm, the ε is used to determine the threshold for whether a grid becomes a cluster, which is essentially a measure of the degree of smoothness between adjacent grids. When adjacent grids can gather together to form a cluster, it means that they should be relatively smooth, which means the slope between the grids is not very steep. Generally, a slope exceeding 45 degrees is considered to be very steep, it is determined that the upper limit of the slope is 45 degrees, the center distance between two adjacent grids can be calculated by the formula (2). Therefore, the upper limit of ε is √ 2 R. In this way, we have determined that the traversal range of ε is [R, √ 2 R]. After determining the value range of the neighborhood radius, we can determine the optimal minimum number of sample points by traversing the value of the neighborhood radius. The method of determination is that when the minimum number of sample points reaches a certain value, the number of noise data and the number of clusters do not change with the change of the neighborhood, then the minimum number of sample points is the current value minus 1.
The minimum number of sample points is determined, the number of clusters within the fixed domain radius is calculated, and the neighborhood radius is determined by calculating the change rate of the cluster with the adjacent domain. The neighborhood radius with the absolute value of the smallest change rate is selected as the best value. At this time, the influence on the formation of the feature line is minimal. The calculation method of the change rate is shown in formula (3). Assuming that the adjacent domain radii are e i and e j , the number of corresponding clusters are C i and C j , and the absolute value of the change (2) D = R∕cos(π∕4) = √ 2R rate is expressed by T i-j , then the calculation formula is as follows: In the process of terrain feature line extraction, to exclude the useless data of the hillside position, the DBSCAN algorithm can also be used to complete. The specific method is to set a relatively large minimum sample point according to the optimal neighborhood radius value to ensure that the points at the top of the mountain can be gathered, and the points at the hillside are identified as noise data. Figure 1 is a mountainous terrain area adopted in this experiment. The resolution of the terrain based on DEM is 30 × 30 m, and the elevation range is between 1327 to 3596 m. It is preprocessed by ArcGIS software and converted into grid text format. The grid text data were subjected to the DBSCAN algorithm for clustering analysis. The experiments are executed on a computer of CPU @2.3GHZ, Intel(R) Core (TM) i5-6300HQ, and main memory of 12 GB.

DBSCAN extracts feature lines
The first is the choice of parameters. Since the resolution of the terrain grid is 30 × 30 m, the minimum value of the neighborhood radius is 30 m. According to the upper limit of the maximum slope of π/4, the upper limit of the neighborhood radius can be determined as 42 m according to formula (2), then the interval is [30,42]. Figure 2 shows the number of clusters and noise points under different minimum sample points. When selecting the value of Minpts, a relatively large value should be selected to ensure that the non-feature lines of the hillside area are excluded. As shown in Fig. 2a, the number of noisy points is constant throughout the neighborhood and equal to the total number of the dataset when Minpts = 6. At this time, when comparing Minpts equal to 4 or 5, selecting a relatively large minimum number of sample points can exclude more nonfeature points and reduce the influence of non-feature points on the extraction of feature lines. Within a smaller radius of a neighborhood, the rate of change of the cluster is also small. Finally, Minpts = 5 is determined to be the appropriate minimum number of sample points according to Fig. 2.
Taking 5 as the minimum sample point, and calculating the number of noise data with different neighborhood radius values. The selected neighborhood radius should enable the points at the top of the mountain to be clustered into multiple clusters, while the points at the hillside are identified as noise data.
Fixing the number of minimum points to 5, we can change the value of neighborhood radius and observe the clustering results under different neighborhood radii. The change rates between adjacent neighborhood values according to formula (3) are shown in Table 1.
As shown in Fig. 2b, when the neighborhood radius is 38 m between adjacent neighborhood, the rate of change of clusters is the smallest. When the neighborhood radius is 37 m, the number of clusters is 176. When the radius is 38 m, the number of clusters is 160. When the radius of 39 m, clusters in the field of number is 159. By calculating the rate of change of clusters, we can find the most accurate neighborhood radius. According to formula (3), the neighborhood change rate between 37 and 38 is 16, the change rate of the domain between 38 and 39 is 1. The radius of the neighborhood with a smaller change rate is selected. It is indicated that the change of the neighborhood has little influence on the cluster when the neighborhood changes. The purpose of choosing 38 m instead of 39 m is to ensure that the data in the hillside area can be identified as noise data and that the extracted ridge lines are more accurate. Therefore, the clustering parameter of comprehensive selection is (38 m, 5).
As shown in Table 1, it can be seen from the table that the change rate of clusters in the neighborhood range is first smaller, then larger, then smaller when the neighborhood is 36 m (meters, m), then larger when the neighborhood is 37 m, and then smaller when the neighborhood is 38 m, and then it has been increasing. From 36 to 37 m, the change rate of this process is 1. Although it is very small, there is still a process with the change rate of 1 at 38 m, which means that when the neighborhood radius is 36 m, it is not the best neighborhood. When the neighborhood radius is 37 m, it is the time when the number of clusters is the most. At this time, many clusters are clustered on the ridge, and at the same time, many areas are clustered on the hillside. When the neighborhood radius is 38 m, the number of clusters starts to decrease, that is, some clusters on the ridge are merged together, because the elevation change of the points on the hillside is greater than that on the ridge, 38 is the critical value for forming the ridge line without clustering the classes on the hillside, and it is also the best value for feature extraction. Therefore, we select (38 m, 5) as the clustering parameters of DBSCAN algorithm.
In order to verify the accuracy of the results, parameters (34 m, 5) (35 m, 5) (36 m, 5) (37 m, 5) (38 m, 5) (39 m, 5) were selected for clustering. The results are shown in Fig. 3. When the radius of the neighborhood is set from 34 to 36 m, the clustered class cannot represent the ridge and valley contour of the terrain, the clustering effect is obviously poor. When neighborhood radius is 37 m, the main outline of the ridge and valley lines can get out, but some of the branch parts have not been successful identification. When the neighborhood radius of 39 m, the ridge branch part was out, the silhouette is also better, but the class is too thick. it will also be counted in the area around the ridgeline, and the extraction of feature lines is not ideal.
To sum up, the density (ε, Minpts) parameter of the DBSCAN algorithm is selected as (38 m, 5), the DBSCAN clustering is performed on the entire terrain, and the subregion division results are shown in Fig. 4a.
Find the point with the largest distance between the feature points in the subregion and its two sides, the center line of the subregion starts from these two points, and passes through the feature points of the sub-region. The closest points are connected to form the characteristic line between the subregions. Finally, the characteristic line graph as shown in Fig. 4b is formed.
We can find the highest or lowest point in the sub-area and the two points farthest from the point in the subarea, then connect the two points with the extreme point, and simulate the trend of the characteristic line of the subarea. We can calculate the minimum distance between two adjacent sub-regions, and use the minimum distance between the subregions as the weight, then construct the spatial topology structure between the subregions, randomly select several consecutive or systematically spaced points on each feature line, and calculate Its average value. The average value can represent the average elevation of the characteristic line, as shown in Table 2. If the elevation magnitude is alternating, it is a ridge line or a valley line. If the three consecutive lines are increasing or decreasing, the middle line is at a hillside location, and points of similar elevation along the hillside location are grouped together.
We show the clustering result in the distribution map by using ArcGIS 10.5 software. Connecting these distributions of the clusters form several lines as shown in Fig. 5.

Experimental comparison between hydrological analysis and clustering analysis
To compare the extraction results with other methods, we implement the hydrological analysis method. The extraction of ridge lines and valley lines are carried out by using the method based on hydrological analysis (Stanislawski et al. 2021;Zhou et al. 2021;Testa et al. 2011;Hinnell et al. 2010) in ArcGIS 10.5. First, we use the window to calculate the average value of the original DEM. This process is called focal mean. In the raster calculator, the original DEM is subtracted by the results of focus statistics. Then we reclassify the results and filter out the positive and negative terrains by using 0 as the grading boundary.
The following is the process of the depression filling. First, the original DEM is filled with the depression filling tool of ArcGIS and the flow direction data is obtained by the flow direction processing of the filled data. Then the confluence accumulation is obtained by the flow processing. After relevant processing, the ridge lines are obtained.
We use the raster calculator to calculate the inverse terrain DEM and perform the flow direction processing on the inverse terrain DEM to obtain the flow direction data with ArcGIS. Then we use the raster calculator to extract the value of the confluence accumulation amount of 0 to obtain the flow direction data, then reclassify the data whose confluence accumulation is 0 and divide it into two levels, and adjust the critical point. Finally, the part with the attribute value close to 0 is the valley line.
The ridge and valley lines extracted by hydrology analysis are at the same position as that extracted by the DBSCAN algorithm. Hydrology analysis is also a traditional method of feature line extraction. It calculates the flow direction and water catchment through anti-topography and analyzes the feature lines according to the water catchment. Finally, the valley lines are directly obtained. The ridge lines are extracted by filling in the original terrain, then we calculate the flow direction and water catchment to obtain the ridge line. Figure 6 is a comparison of the results of feature lines extracted by the two methods. From the comparison between Fig. 6a and b, for the same terrain, they can both extract the ridge line and valley line. In the ridge line and valley line extracted by DBSCAN, there are a large number of points in some areas; that is, the gathered clusters are long and wide, and it reflects that the terrain similarity of grid points in this area is high, it indicates the gentle terrain slope of the   ,2940,2884 2070,1860,2084 1770,2940,2845 1950,2010,2043 1980,2610,2502 2471.6 12 1560,1950,2114 1710,2580,2485 1770,2550,2437 1860,2430,2331 1800,2490,2388 2351 13 1110,2280,2528 1200,2160,2425 1200,3120,3291 1230,2790,3010 1230,1890,2277 2706.2 14 180,3030,1868 1050,4530,1806 1080,4620,1815 120,3540,2172 120,3150,1925 1917.2 15 2940,1650,2046 3000,2490,2514 3030,2550,2564 3000,1950,2302 3390,1860,2504 2386 target area. However, the clusters are small, or some areas on a feature line are not clustered, it indicates that the slope of the area is steep, the elevation of the points along with the feature line changes, and the elevation of the feature line increases or decreases in a certain direction. In terms of extraction steps, DBSCAN clustering extraction of terrain feature line is relatively simple, it can extract ridge lines and valley lines at the same time. But in the hydrological analysis, ridge lines and valley lines are extracted by different steps or sub-methods and the extraction process is much more complex than the DBSCAN algorithm. Ridge line extraction needs to obtain the filling data of the original terrain firstly. But valley line extraction needs to obtain the anti-terrain data of the original terrain firstly. After obtaining two different datasets, it is necessary to combine the positive and negative terrain data to obtain the feature line, and the water collecting line and water collecting line in the branch area of water extraction were not gathered. For getting accurate analysis time, we make several times of experiments to run clustering analysis and the hydrological analysis respectively. In the experiment in Fig. 6, The average time of the DBSCAN clustering method is 429 s and the hydrological analysis method is 780 s. Obviously, the method in this paper reduces 351 s and provides 1.81 times in comparison to the hydrological analysis method.
In order to verify the feasibility of the experimental method, we select two new datasets for comparison, as shown in Fig. 7.
In Fig. 7a and b are terrain with elevation values ranging from 243 to 905 m, the overall terrain elevation is not very high. The DBSCAN parameter is determined to be (31 m, 5) by the above method, the feature line extracted by DBSCAN took 550 s and the feature line obtained by hydrological extraction took 629 s. the method in this paper reduces by 79 s and provides 1.14 times in comparison to the hydrological analysis method. In the comparison of experiments Fig. 7c and d, the experimental terrain data came from the same data source as the terrain data used in Fig. 1, its terrain characteristics and the range of elevation values were similar. In the case of similar two terrain features, the parameters selected when extracting terrain features using DBSCAN algorithm are also applicable to feature extraction of other terrains, the result obtained by selecting (38 m, 5) as the parameters clustering for DBSCAN is shown in Fig. 7c. The process takes less than 10 s, while Fig. 7d uses 538 s to extract the feature line by hydrological method. Two methods extract feature lines with obvious differences in time consumption. At the same time, for terrain with a larger amount of data, when extracting feature lines, the DBSCAN algorithm saves more time than hydrological extraction.
There is not much difference in the accuracy of extraction, but in terms of extraction time and method, the proposed DBSCAN method has a greater improvement in time than hydrological analysis, and the method is also simpler.

Conclusion
In this paper, the DBSCAN clustering algorithm is used to extract ridge lines and valley lines in a new way. According to the DBSCAN algorithm automatically identifies similar points, the points with high similarity are gathered together along the direction of a ridge or valley, the whole terrain will be gathered into multiple regions. By calculating the minimum distance between regions, a subregion map is constructed, and the lines with larger distances between sub-regions are deleted to form multiple feature lines. By comparing the elevation values of the points on these feature lines, it is easy to determine whether they are ridge lines or valley lines, or a line formed by points on the hillside. If there are several adjacent lines with increasing or decreasing elevation, the feature line with the largest average elevation is the ridge line and the lowest is the valley line. The proposed method is an effective application of cluster analysis combined with terrain feature extraction. At the same time, for data with similar terrain features, the parameters of DBSCAN algorithm are also applicable, which greatly saves the extraction time. Through the DBSCAN parameter selection strategy proposed in this paper, the feature lines in the terrain can be extracted better and the efficiency is higher, and the feature line extraction method in this paper is a novel method that can reflect the slope change of the feature lines.
The future work is on how to determine the density parameters of the DBSCAN algorithm based on terrain features, and study further the relationship between the selection of density parameters in the DBSCAN clustering method and terrain features such as resolution, different types of terrain, and so on. For a large area without extracted features, further steps are required to extract features for specific areas. The terrain distribution and morphological characteristics are also further analyzed by the shape and size of the clusters formed by the DBSCAN algorithm. Fig. 7 Comparison of feature line extraction:(a)clustering method with lower terrain; (b) hydrological extraction with lower terrain; (c)clustering method with higher terrain; (d) hydrological extraction with higher terrain