Each point cloud dataset used in this experiment were downloaded from OpenTopography, a web-based community resource for topographic data. The source code was written in Python version 3.10.10 (Python Software Foundation, Beaverton, Oregon, USA) on a machine with Core i7 3rd Generation CPU, 16 GB RAM, Intel HD graphics 4000 GPU and Microsoft Windows 10 OS.

## 3.2. Method for analysis

The method to generate output is explained with the help of nine steps, each under subheadings numbered from 3.2.1 to 3.2.9.

## 3.2.1. Pre-processing of data

The method of feature line extraction discussed in this paper subdivides data as per its X and Y coordinate values and places them onto a 2D square grid. This method processes coordinate values in whole numbers as this type of number is easier to partition. A raw data may consist of coordinate values in decimal numbers because of the unit of measurement used. Ignoring the digits beyond the decimal can lead to loss of crucial information. Hence, scaling values beyond the decimal is significant while analysing a data. This leads to the necessity of preprocessing data.

In this step, the X, Y, Z coordinate values of the raw file are stored in a new file despite the raw file having more information. Then maximum possible values across X-axis and Y-axis, *X*max(\(\stackrel{\sim}{x}\)max, \(\stackrel{\sim}{x}\)min, *d*) and *Y*max(\(\stackrel{\sim}{y}\)max, \(\stackrel{\sim}{y}\)min, *d*) are identified and the decimal places are rounded off as per the parameter value, *d*. These two values are identified since it helps in understanding the dimension of the area to be processed in whole numbers, after setting the number of decimal places to be covered. \(\stackrel{\sim}{x}\)max and \(\stackrel{\sim}{x}\)min denotes rounded values of original *x*max and *x*min values simultaneously. A demonstration of *X*max(\(\stackrel{\sim}{x}\)max, \(\stackrel{\sim}{\text{x}}\)min, *d*) calculation is shown below:

*X* max(\(\stackrel{\sim}{x}\)max, \(\stackrel{\sim}{x}\)min, *d*) = (\(\stackrel{\sim}{x}\)max - \(\stackrel{\sim}{x}\)min) / (1 / (10*d*))

E.g. If *d* = 2, *x*min = 337104.568, *x*max = 337408.412, then \(\stackrel{\sim}{x}\)min = 337104.57, \(\stackrel{\sim}{x}\)max = 337408.41 and *X*max(\(\stackrel{\sim}{x}\)max, \(\stackrel{\sim}{x}\)min, *d*) = 30384.

The program also converts coordinate values of all the points to whole numbers because whole numbers are easier to analyse. Finally, the limits (minima and maxima values) of X, Y and Z-axes along with *X*max(\(\stackrel{\sim}{x}\)max, \(\stackrel{\sim}{x}\)min, *d*) and *Y*max(\(\stackrel{\sim}{y}\)max, \(\stackrel{\sim}{y}\)min, *d*) are exported.

## 3.2.2. Identification of local maxima / minima in data

For extracting feature lines, it is important to first identify the points that can be used to create edges for representing these feature lines. These points are either maxima or minima, in terms of height, of a localised region. To identify these points, the area covered by a data across X,Y plane is first subdivided in a way that these subdivisions can fit in a square grid. This approach of plotting 3D data on 2D plane has been preferred because iterating over 2D plane should be computationally faster than iterating over multiple layers of planes in 3D space or the volume of a 3D object. Furthermore, terrain data typically do not exhibit a top-heavy object type. Therefore, projecting higher elevation points onto a lower elevation plane will not significantly affect the feature lines formed by the terrain's morphology. In this paper, the term used to denote these subdivisions is block. If a block has more than one point, then the average value of the points is considered when detecting maxima and minima. To identify local maxima and minima, the height values of a continuous set of blocks along either the X or Y axis are compared. In this paper, these sets of continuous blocks are referred to as sections. The total number of maxima or minima identified in a row or column of blocks depends on the position from which a section updates its set of blocks to detect a new local maximum or minimum. If the position from which the section updates is too close to the position of the previous iteration, it can lead to the identification of many unwanted maxima or minima. To prevent this, the discussed method also considers the coverage range of local maxima or minima. Consequently, if a new maximum falls within the coverage range of a previous maximum, it is discarded. The same principle applies to minima.

In this step, files storing modified data and limits are initially imported and the system refers values of the following parameters:

**Block size**

It is the value that acts as a resolution definer to export the dimension of data identified by *X*max(max, min, *d*) and *Y*max(max, min, *d*). Resolution is calculated by dividing these *X*max(max, min, *d*) and *Y*max(max, min, *d*) with this block size and taking the round off value.

**Span across maxima / minima points**

It is the value which determines the number of positions considered as coverage across maxima / minima points in each section.

In Fig. 2, all three diagrams are meant to demonstrate maxima identification. The diagram *Data under analysis* shows a data being divided into 12 blocks and size of a section is 7 blocks. The squares in 3rd, 9th and 10th blocks represent points with high Z-axis value and the squares in other blocks represent points with low Z-axis value. The diagram *Identified points (with span)* shows the high points of *Data under analysis* along with their coverage (span value of 1) identified in each iteration made by the section. The diagram *Identified points (without span)* shows only the high points identified in each iteration made by the section. Similarly, for minima identification, the points with low Z-axis values should be considered.

The size of a section is identified by the parameter Block size. E.g. If the parameter value is 7, then blocks are of dimension 7x7 and the section would contain a set of 7 such blocks.

A section updates its position by 1 block at a time. In each section iteration, if either of the boundary positions are to be considered as per the span value then the value extraction from that iteration is discarded. E.g. Consider a section of 7 blocks. Its boundary blocks are 1st and 7th. If the given span value is 2 and local maxima is identified in 5th block of the section, then 3rd, 4th, 6th and 7th blocks (5–2, 5 + 2) are also considered as part of high point. But 7th block is a boundary block of section. So, the local maxima value (from 5th block) collected from that section iteration is discarded.

Once all blocks are covered using section-based iteration, then this output as high / low points excluding coverage (span) is passed for further processing.

## 3.2.3. Filtration by minimum local height difference

There is a need to filter local maxima or minima points as some of them may represent artefacts, which are not needed for the expected analysis. Usually, absolute values of the prominent slopes of a terrain are higher than the absolute values of slopes calculated between many artefacts and their respective surfaces. So, it is convenient to use a height filter to exclude unwanted maxima and minima points.

In this step, the difference between maxima and minima (identified from step 2) of each section, *h*diff are compared with an input local height difference value, *h*local. For valid local maxima identification, if *h*diff > = *h*local, then that highest point from step 2 is considered for step 4. Similarly, for valid local minima identification, the lowest point from step 2 is considered if *h*diff > = *h*local. The expected parameter is minimum local height difference.

In Fig. 3, both the diagrams are meant to demonstrate height difference comparison. The diagram *Data under analysis* shows a data being divided into 12 blocks and size of a section is 7 blocks. The diagram *Identified points* shows only the local maxima points identified in each iteration made by the section. The expected *h*local value is 10. So, only one value (i.e. 20 from 4th block; since 20–8 = 12) satisfies the condition.

## 3.2.4. Filtration by section median coverage

Ridges and valleys in a terrain are formed when two slopes or flanks from opposite directions intersect. There is always a major difference in height between ridge / valley lines and their adjacent areas. So, a section scanned over an area through which a ridge / valley line crosses can result in the identification of ridge points as maxima and valley points as minima, with them being close in position to the median of that section. But a section scanned over an area of just one slope or one flank would mostly result in the identification of maxima / minima points towards the boundary of that section instead of its median. This leads to the necessity of filtering identified points based on their block position in a section.

In this step, the program checks whether the local maxima and minima points, identified from step 3 are part of the median of a section. For that, the program compares the position of these maxima and minima points, *p*max and *p*min with the position of median, *p*m of that section and it refers a parameter to check expansion of a median.

The expected parameter is section median coverage (in %), *c*m.

E.g. 1) If section size = 9, then *p*m is 5. If *c*m is set to 25 then coverage of section becomes positions 4,5,6. When 5 is originally median then four places from both left and right sides of 5 are non-median. Setting *c*m = 25 makes 25% of 4 = 1. So, median range is expanded by one place in both the sides.

In Fig. 4, all four diagrams are meant to demonstrate section median coverage for maxima points. The diagram *Data* shows a data being divided into 12 blocks which gets analyzed with section of size 9 units. The *c*m value is set to 25. So, just like e.g. 1, the median coverage expands by 1 unit.

From the diagram *Section under analysis (accepted)*, it can be noted that the highest point is 18 and it is one position away from the original median i.e. 16. So, this point is accepted as shown in the diagram *Identified points*.

From the diagram *Section under analysis (rejected)*, it can be noted that the highest point is 20 and it is two positions away from the original median i.e. 8. So, this point is rejected, hence not shown in the diagram *Identified points*.

For processing steps 5–8, the following parameter values are referred - minimum breakline length along X,Y coordinates; maximum length between two neighbouring points; maximum length to form loops.

## 3.2.5. Edge creation within maxima / minima

Edges between maxima / minima points should be created only within a certain neighbourhood distance, as these points are derived from continuous features in the data. Forming edges between points that are far apart lacks meaning, as they do not share direct continuity.

In this step, the parameter, maximum length between two neighbouring points, *l*max(*n**a*, *n**b*) is referred to create edges with points within *k* neighbouring blocks. Value of this *k* for a point can vary for each direction (*-x*, *+x*, *-y*, *+y*) based on the point’s position inside a block. A demonstration of the formula for *k* along *-x* and *+ x* is shown below:

*k* *− x* = ⌊ ( *p**x* - *x**s* - *l*max(*n**a*, *n**b*) ) / *b* ⌋

*k* *+ x* = ⌊ ( *p**x* - *x**s* + *l*max(*n**a*, *n**b*) ) / *b* ⌋

where,

*p* *x* → X-axis value of current point

*x* *s* → smallest value of X-axis in data

*b* → block size

⌊ ⌋ → floor function

In Fig. 5, edges are to be created between a point present at the center of the block position (3,3) and its neighbours in the block positions (3,1), (4,1), (5,2), (4,3), (2,4), (1,5). Each square block is of size 1 unit and *l*max(*n**a*, *n**b*) value is 1.85 units, which is shown as a circular boundary for point in block (3,3) to form edges. As per these conditions, only three neighbouring points out of six are taken into consideration.

## 3.2.6. Merge of networks

Points in the 2D grid are iterated from left to right and top to bottom, even for network generation. Two networks of edges can only be merged through an unconnected point that is equidistant from both networks. If such a common point exists, it is more feasible to merge a network with its preceding one, as this will not disrupt the order of block iteration in the grid.

In this steps, separate networks (which are collections of edges) are merged together if they share any common point.

In the 2nd diagram of Fig. 6, three networks created are {((1,1), (1,2)), ((1,2), (1,3))}, {((3,1), (3,2)), ((3,2), (3,3))} and {((5,1), (5,2))}. The point in block position (2,4) is common for both 1st and 2nd networks. So, the 2nd network appearing orderly in 2nd number (left to right side) is merged with the 1st network, as shown in 3rd diagram. The updated version of the 1st network is {((1,1), (1,2)), ((1,2), (1,3)), ((1,3), (2,4)), ((2,4), (3,1)), ((3,1), (3,2)), ((3,2), (3,3))}.

## 3.2.7. Valid network identification

Feature lines, when subdivided, would result in networks of edges whose length on one axis is much greater than the other. Conversely, networks of edges, which are formed on the basis of maxima / minima points collected from artefacts, would be relatively shorter in length on either of their axes. So, when a set of networks undergoes the process of feature line extraction, it is justified to discard networks on the basis of their length in both axes.

In this step, the parameter, minimum breakline length along X,Y coordinates, bllmin is referred to identify valid networks. If total length of a network along either X or Y-axis is greater than or equal to bllmin then that network is considered valid.

In Fig. 7, bllmin expected is 4 units and length of each block is 1 unit. So, the network shown in the left diagram is valid as its length along X-axis is greater than 4 units whereas the network shown in the right diagram is invalid as its length along either X or Y-axis is less than 4 units.

## 3.2.8. Refinement of detected networks

When edges are created between all points in a neighborhood, some edges do not represent continuous features of the data and should be removed. Many of these edges can be eliminated using the Minimum Spanning Tree (MST) algorithm, which selects edges in the network that avoid creating loops and have the minimum possible edge weight. If any of the discarded loops are valid, meaning they represent continuous features, they can be reinstated by referring to specific conditions.

In this step, networks are individually refined. This is done by first applying Kruskal’s version of MST algorithm [22] on a network. Then, the parameter, maximum length to form loop, lplmax is referred to recreate some valid loops.

In Fig. 8, 1st diagram depicts an unrefined network formed by 4 points and spanned over blocks with side length of 1 unit. 2nd diagram depicts MST algorithm applied on that network. 3rd diagram depicts recreation of a valid loop as per expected lplmax value of 2 units.

## 3.2.9. Exportation of network

In this step, the final network is then exported as a Wavefront .obj file, with the coordinate values converted to decimal numbers, which were initially converted to whole numbers in step 1. When converting to decimal numbers, the decimal places to be assigned depend on the parameter *d*, discussed earlier. This output represents the expected feature lines, which can be viewed with the help of software like MeshLab and CloudCompare.