Accuracy assessment of old large-scale maps and reducing positional error in land use change analyses

doi:10.21203/rs.3.rs-4184063/v1

Download PDF

Research Article

Accuracy assessment of old large-scale maps and reducing positional error in land use change analyses

https://doi.org/10.21203/rs.3.rs-4184063/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Context

Old large-scale maps are one of the main data sources on historic landscapes and form the basis of many landscape studies. However, few studies have addressed the issue of assessing the accuracy of map sources and the impact of this accuracy on the results of spatiotemporal analyses of landscape evolution.

Objectives

The purpose of this study was to verify the positional accuracy of large-scale maps used in landscape analyses and to test the possibility of eliminating the influence of mutual positional inconsistency of map sources on the results of this analysis. Narrow residual polygons, referred to as sliver polygons, arising during overlay operations because of positional errors in old maps can affect the results of the analysis, so it is appropriate to determine to what extent this happens, whether and when it is necessary to eliminate their influence and by what methods.

Methods

The positional accuracy of the vector models derived from old maps was verified in three model areas around the Vltava River by quantifying the mean positional error of a set of control points. Different methods for removing sliver polygons were proposed and tested for the selected test area within the model area by comparing the selected results of the spatiotemporal analysis.

Results

The achieved values of the mean positional errors for the historical data models from the mid-19th and mid-20th centuries are in the range of three to four metres for the model areas, which is highly accurate considering the scale values of the old maps used, confirming the suitability of these maps for landscape studies. The reverse vectorization of the time series of the maps eliminated the residual polygons due to positional error and thus reduced the false change areas, which was most evident in the change maps. The change maps after using this procedure better reflected the true changes. A method of identifying them based on their position within a buffer of a given width and then eliminating them by joining them to a neighbouring polygon was proposed as the most appropriate method for removing sliver polygons in overlay analyses.

Conclusions

Old large-scale maps are a very valuable source of historical data and have a place in landscape studies, especially when researching smaller areas, such as municipalities or cadastres, where they allow work at the level of land parcels. It has been confirmed that the positional inconsistency of map sources can be eliminated to a certain extent by the chosen time series vectorization procedure. Considering the type of study, the type of spatial data used, and the type of results that characterise the change in the area, it is advisable to choose an adequate method for refining the results.

old maps

GIS analysis

accuracy assessment

sliver polygons

land use changes

Vltava River

To understand the dynamics of land use changes in the landscape, it is necessary to process available data from the past and reconstruct the landscape and its development based on these data or model the development of the landscape in the future. Many different sources of historical data allow us to reconstruct land use to a certain time horizon; in general, these sources can be divided into spatial and statistical sources. As historical spatial data sources, aerial or satellite archival imagery (Badjana et al., 2015), old medium-scale maps (Eremiášová and Skokanová, 2009; Gimmi et al., 2011; Skaloš et al., 2012; Lathouwers et al., 2023), and, less frequently, large-scale maps (Bender et al., 2005; Bürgi et al., 2015) are used to analyse land use changes. As statistical data, written documentation of land cadastres can be used (Bičı́k et al., 2001), or various statistical lexicons and yearbooks can contain information on the structure of land use in each territorial unit. If this statistical information is available for different time horizons, trends and changes in the structure of land use can be observed, but the spatial distribution within a given territorial unit is missing. Many studies use a combination of multiple types of spatial data within a single study (Cousins, 2001; Petit and Lambin, 2002; Hamre et al., 2007; Skaloš and Engstová, 2010; Pindozzi et al., 2016; Chen et al., 2019), usually a combination of old maps for the earliest time horizons and satellite or aerial imagery for more recent periods or field surveys for the current state of the landscape to provide a complete understanding of land use change.

The type of spatial data to choose for a given study is influenced by many factors: the availability of data for a given area and the required time period, and the purpose of the study—to monitor the change in the area as a whole or only a selected phenomenon. Tracking a selected phenomenon is applied, for example, in a study (Istrate et al., 2023) in which the evolution of forests is monitored. The evolution of ponds is investigated (Pavelková et al., 2016), and the evolution of water bodies and wetlands is investigated (Gimmi et al., 2011; Šantrůčková M. et al., 2017). Other factors influencing the selection of input data include the possibility of linking with other data, e.g., demographic, geological, or socioeconomic data; the chosen methodology and tools used in processing and analysing the input data (whether raster or vector models will be analysed); and, last but not least, the desired outputs and interpretation of the analysis results. When studying the evolution of land use, the supporting information is obtained from the planimetric component of the old map, but in some cases, it may also be appropriate to process the altimetric component of the map and to look at the change in relief in the area, or a combination of both, that is, to look for a relationship between land use change and relief change (Tortora et al., 2015; Statuto et al., 2017; Istrate et al., 2023).

The subject of this study is old large-scale maps, which, with their detailed spatial information and fine resolution, are valuable sources of information for the study of land use change over time, providing detailed information on the landscape, land ownership and land use patterns. Old, large-scale maps allow researchers to conduct a temporal analysis of land use change over decades to centuries, which is an advantage over aerial and satellite imagery that does not go as far back in time. Large-scale maps allow land use development to be tracked at the individual parcel level, which allows for additional tracking data, such as landowner information, to be linked to individual parcels. The socioeconomic context of land use change is thus reflected in the results of the analysis, as changes in land ownership patterns can be related to changes in land use. Old large-scale maps are particularly useful when studying urban growth and expansion; they can show how cities have evolved, where new development has occurred and how land use has changed over time in urban areas (Scăunaș et al., 2019).

The vector models of the territory created on the basis of old large-scale maps contain geometric objects (points, lines, polygons), their interrelations and properties, and serve not only for the visualisation of the territory at the time corresponding to the creation of the original map but also for spatiotemporal analysis (it becomes one of the layers of the resulting multitemporal GIS). The vectorization process, in this study understood as the transition from scanned georeferenced map bases to vector models, can generally be performed manually, semiautomatically, or automatically. The appropriate method must be chosen considering the type, quantity and quality of the input map data and the available software options. Manual vectorization is completely under the control of the user and can be performed in standard GIS software without special knowledge of the software; the raster base may be of poorer quality, but this method is time-consuming for large volumes of data. Semiautomatic vectorization is also under user control, and the user must also control the vectorization process in this case; however, using special software tools, snapping to the raster or automatic tracking of the raster cells with subsequent vector generation is used. Automatic vectorization consists of automatic vector generation, which offers the possibility of using machine learning and neural networks and can be used to vectorize even larger volumes of data very efficiently. The potential use of automatic vectorization of the content of digitised old maps or part of it is discussed in (Iosifescu et al., 2016; Chiang et al., 2020; Kratochvílová and Cajthaml, 2020).

There are two approaches for vectorizing the time series of old maps (Skokanova, 2008). The first approach, called sequential vectorization, consists of creating historical vector models from different time horizons independently. The advantage of this method is the simplicity of the processing and the possibility of creating multiple models simultaneously. The disadvantage is that each vector model carries its positional inaccuracy, which causes inaccuracies in overlay operations. The second approach, called reverse vectorization, establishes one of the maps as a reference, usually the current map, which should represent the most accurate map source. The vector model created from this map is the basis for the vectorization of the next period map. The other model is only adjusted if there is a real change from the reference map and not a change due to the positional inaccuracy of the two map sources. Thus, the old maps are used here only to identify changes, e.g., those used in (Bender et al., 2005). The advantage is the elimination of mutual positional inconsistencies of vector models and the speeding up of processing, especially in the case of smaller and simpler areas. The disadvantage may be the uncertainty in assessing whether there is a real change or an error caused by positional inaccuracies.

In general, vector models used for land use analyses are burdened by positional and thematic errors. Thematic errors consist of incorrectly assigned land use categories or incorrectly stated land use categories in the map source. The positional accuracy of old maps of various scales concerning their use in historical landscape research is addressed in (Cajthaml and Krejčí, 2008; Frajer and Geletič, 2011). Positional errors are caused by the inaccuracy of the old map and by errors arising during individual processing processes (scanning, georeferencing, and vectorization). During overlay operations, these errors lead to the formation of sliver polygons. The presence of sliver polygons, small polygons, and narrow residual polygons resulting from positional inconsistencies in the mapping sources can cause a problem in accurately capturing land use changes. These polygons can lead to false positive interpretations of changes and compromise the reliability of the analysis (Mas, 2005; Skokanová, 2008; Clercq et al., 2009).

This study focuses on the verification of the positional accuracy of large-scale map sources on the example of model areas in the surrounding of the Vltava River and methods for solving the problem of sliver polygons that appear when overlaying vector models and examines the influence of these residual polygons on the results of the analyses and the possibility of their elimination.

Study area

The area of interest consists of a strip several kilometres wide of the immediate surroundings of the upper and middle course of the Vltava River, from the source of the river in the Šumava Mountains in southern Bohemia to the confluence with the Berounka River in central Bohemia south of Prague. The length of the river in this section is 367 km, and its flow decreases from an altitude of 1272 m to 194 m above sea level. The boundaries of the study area follow the boundaries of selected cadastral areas (red line in Fig. 1). The total area of the study area is more than 2 300 km², and the width of the belt ranges from hundreds of metres to five kilometres from the river center line depending on the location of interest. The entire area of interest has been the subject of research tracing the historical changes in the Vltava valley (Cajthaml et al., 2022), influenced mainly by the construction of a system of nine dams on the Vltava River in the twentieth century, the so-called Vltava Cascade. The construction of reservoirs is one of the important drivers of land use change, and the surrounding landscape is affected by more intensive changes (Havlicek et al., 2022). Within this area of interest, three model areas in the vicinity of the largest reservoirs of the Vltava Cascade - Lipno water reservoir, Orlik water reservoir and Slapy water reservoir (dark red rectangles in Fig. 1) - were selected for which the accuracy of historical land use vector models derived from old maps was investigated. The model areas are 6 km x 24 km for Slapy and Orlik and 10 km x 32 km for Lipno.

In the northern part of the Slapy model area, a test area of 500 m × 500 m was selected to test the proposed methods of removing sliver polygons, which arise during overlay operations within GIS analyses, and their influence on selected outputs characterising the development of land use in the area. The test area was chosen to include both the outside built-up area and the built-up area features, stable and change areas, and examples of features with dominantly linear characteristics. In all three time horizons, six land use categories are represented in the test area.

Data sources and processing methodology

In the case of this study, the data sources that served as the basis for the creation of the historical vector models were old large-scale maps. For the time horizon T1, these were the imperial obligatory imprints of the stable cadastre at a scale of 1:2 880, which were created by direct measurement between 1824 and 1843 (CZM, 2010). The maps contain individual parcels of land colour-coded according to land use, possibly supplemented with point markers specifying a particular subgroup of land use within one land use category (differentiation of forests, permanent crops). For the T2 time horizon, the State Map derived 1:5 000 in the first edition of the 1950s was used as a source of information on land use. On this map, the use of individual land parcels is represented by point map symbols, which are shown in black along with the boundaries of the land parcels. The map was not produced by direct measurement, but the planimetric part was derived from cadastral maps, and the altimetric part was derived from the most appropriate existing data, e.g., topographic maps.

For the time horizon T3, which in this study corresponds to the current state of the landscape, vector spatial data on land parcels and building objects from the Register of Territorial Identification, Addresses and Real Estates (RÚIAN) were used. The vector model derived from it reflects the state of the current information system of the Cadastre of the Czech Republic. This vector model was taken as a reference for the assessment of the accuracy of historical vector models. For this study, the inner accuracy of this model in relation to the actual land use in the locality, which we would be able to verify based on orthophotos or field research, was not considered. Examples of the data sources for all three-time horizons considered for the test area are shown in Fig. 2.

The purpose of processing the available data sources is to create vector models of the same area in the same coordinate system at different time horizons (Fig. 3). Therefore, the scanned map data must first be georeferenced into a common coordinate system using appropriately chosen types of transformations. In this study, projective and affine transformations were used, supplemented in some cases by higher-order polynomial transformations for map sheets from time horizon T1. Map sheets from horizon T2 were georeferenced using the projective transformation method using the corners of the map sheets. For larger areas and multiple map sheets, it is advisable to combine the georeferenced map sheets into a seamless mosaic to facilitate subsequent vectorization, especially at map sheet edges.

The vectorization of georeferenced map data was carried out using the manual sequential vectorization method over the study area; for the test area, reverse vectorization was also used to create vector models for the time horizons T1 and T2. For all vector models, it was also necessary to remove topological errors (overlaps and gaps in the data) that occurred due to inaccuracies in processing and that could affect subsequent analyses. To compare the vector models with each other, it was necessary to unify the land use categories at each time horizon; for this study, seven land use categories were defined (arable land, permanent grassland, garden and orchard, forest area, water area, other area, built area and courtyard). Only six of the seven categories (Fig. 1) were found in the test area, with no forest areas occurring in any of the study periods.

The vector models used for the spatiotemporal analyses must cover the same territory, be in the same coordinate system, be without topological errors, and include harmonised land use categories. Overlay operations can be carried out between the vector models prepared in this way to analyse the change in land use over time for a selected area.

Assessing the positional accuracy of historical vector models

The positional accuracy of historical vector models derived from old maps was determined for three model areas by calculating the positional characteristics for a set of control points. The positional accuracy of the vector models was related to the current vector model, which was taken as a reference in this study. The following requirements were met for the control points: uniform distribution of points within the model domain, identifiability of all three models, and the assumption that no change in position occurred at that point over the time period. Examples of selected control points are building corners, road crossings, pond embankments, or significant break points at land parcels boundaries (see Fig. 4). This results in three sets of control point coordinates in a uniform coordinate system for each model area.

By comparing the sets of coordinates of the historical and current vector models, the coordinate differences and displacements at individual control points were determined:

$${dP}_{i}=\sqrt{d{X}_{i}^{2}+d{Y}_{i}^{2}}$$

${dP}_{i}$ point displacement

${dX}_{i}$ X coordinate difference

${dY}_{i}$ Y coordinate difference

The mean point displacement and mean position error for the set of coordinates of the control points of the vector model were determined according to the following formulas:

$$dP=\frac{\sum _{i=1}^{n}{dP}_{i}}{n}{ , m}_{p}=\sqrt{\frac{\sum _{i=1}^{n}{dP}_{i}^{2}}{n}}$$

$dP$ mean point displacement

${m}_{p}$ mean position error

$n$ number of control points

Elimination of sliver polygons

In the introduction of the paper, two different approaches to vectorizing the time series of old map bases were described: reverse and sequential vectorization methods. The principle of the reverse vectorization method, where the historical vector models are created by modifying the current vector model, implies that the positional uncertainty of the two models and the associated creation of sliver polygons during overlay operations are already eliminated during the vectorization process. Thus, the reverse vectorization method directly prevents the formation of these sliver polygons. In Fig. 5, the differences between the models created by sequential and reverse vectorization on part of the test area can be seen. In the overlay analyses, in the first case, sliver polygons would have been formed, and subsequently, these polygons would have been evaluated as change polygons when, in this case, there were no land use changes but only a positional inconsistency in the map data.

If sequential vectorization is chosen as the method of creating historical vector models, sliver polygons are created during overlay operations. Their subsequent elimination can be viewed in two ways: to transform the historical vector models to the current reference model, thus attempting to reduce the positional inconsistency of the two models and only subsequently perform overlay operations, or, conversely, to first perform the overlay operations and then identify the sliver polygons and eliminate them. Figure 6 presents the methods for removing sliver polygons that were proposed and tested in this study.

From the transformations of historical vector models, an affine transformation using ground control points and a rubbersheet transformation were proposed. For the rubbersheet transform, different search distance values were chosen to generate oriented rubbersheet links (also known as displacement links) between vector models. At a given search distance, matching lines between two vector models are searched, and rubbersheet links are generated. The differences between the rubbersheeting method used to adjust the features were also tested, namely, the linear method and the natural neighbour method. The linear method is slightly faster and produces good results when many links are spread uniformly over the data that you are adjusting. The second method is slower but more accurate when there are few rubbersheet links, which are scattered across the dataset (ArcGIS Pro, 2023).

To identify sliver polygons, two methods with different values of input parameters were tested. The first was the identification of the sliver polygons based on the value of the thinness ratio parameter, calculated according to the relation 4*3.14*Area/(Length*Length). Here, the assumption was that sliver polygons tend to be narrow and elongated in shape, so their thinness ratio will be very low, and they can be identified based on this parameter. The second method uses the positions of the sliver polygons inside a buffer generated at the intersection of two vector models. Different values of the buffer width, chosen concerning the positional accuracy results of the historical vector models, were tested, while the size of the smallest features in the models was tested to ensure that these were not removed. Polygons that lay completely within this band were identified as sliver polygons. In both cases, the identified polygons were eliminated by joining the neighbouring polygon with the longest common boundary.

The positional accuracy of historical vector models

In the Lipno model area, 281 control points were selected; in the Slapy and Orlík model areas, 130 control points were selected (Fig. 7). In all areas, at least 70 control points were located on the corner of a building. In the Lipno model area, a larger number of points had to be selected in proportion to the larger size of the area, and at the same time, the selection of control points was the most difficult in this area. It is an area close to the state border with a large proportion of forest areas and a low number of settlements of a smaller character, which, moreover, underwent significant changes during the period under consideration due to population exchange after the Second World War, the inaccessible border zone in the period 1950–1990 and, finally, the construction of the Lipno Reservoir.

For the control points, the Y and X coordinates in all three vector models were obtained, and the coordinate differences and the corresponding positional accuracy characteristics were calculated. At the same time, the azimuths of the error vectors (an oriented line that has its origin at the position of the control point in the vector models T1 and T2, respectively, and the end of the line is at the position of the ground control point in the reference model T3) were determined. For each model area and both historical vector models, a set of error vector azimuths was obtained. The frequency of representation of each azimuth is shown by histograms in Fig. 8. The plots show that in neither case is there a significant predominance of either direction of displacement at the control points, and the azimuths are in most cases evenly distributed over the entire range of values. It can therefore be concluded that the historical vector models in the model areas are not burdened by systematic error.

The results of the accuracy analysis for the individual model areas are presented in Table 1. In the Lipno model area, the mean positional error was calculated based on a set of control points to be 3.32 m for the vector model from the time horizon T1 and 4.33 m for the vector model from T2. There were no significant differences between the control points at the corners of the buildings and the other control points. For the Orlik model area, the mean positional error was determined to be 3.84 m for the vector model from time horizon T1 and 3.38 m for the vector model from T2. In both cases, the mean positional error was greater for the control points on the buildings. In the last model area, Slapy, the mean position error was the lowest in both cases, 3.07 m for the vector model from time horizon T1 and 3.31 m for the vector model from T2. For the vector model from the T1 time horizon, the mean position error was significantly smaller for the control points at the corners of the buildings, while for the T2 time horizon, there was no significant difference between the values from the two sets. The histograms in Fig. 9 show the magnitudes of the displacements at the control points for each vector model.

The results show that the values of the mean positional errors are similar for both historical vector models in all model areas and range from 3.07 m to 4.33 m, which, concerning the scales of the map data used (1:2 880 for T1, 1:5 000 for T2), represent distances of approximately 1 mm on the map sheet. There are no major significant differences in the results between the control points at the corners of the buildings and the other control points. Slightly better accuracy results are achieved by the vector model from T1, which may be due both to the larger scale of the original map and to the method of creating the map itself, which was created by precise geodetic measurements.

Table 1

Results of the positional accuracy analysis of the historical vector models in the model areas
Model location	Type of control points	Number of control points	T1		T2
Model location	Type of control points	Number of control points	Mean positional error [m]	The mean length of the error vector [m]	Mean positional error [m]	The mean length of the error vector [m]
Lipno	all	281	3,32	2,84	4,33	3,77
	building	70	3,22	2,72	4,25	3,80
	other	211	3,35	2,88	4,36	3,76
Orlík	all	130	3,84	3,11	3,38	3,01
	building	77	3,98	3,26	3,47	3,06
	other	53	3,65	2,91	3,26	2,95
Slapy	all	130	3,07	2,43	3,31	2,99
	building	75	2,92	2,36	3,33	2,97
	other	55	3,27	2,52	3,28	3,02

Effect of the proposed sliver polygon elimination methods on the output of spatiotemporal analysis

A comparison of different methods of vectorization and elimination of sliver polygons was performed for the test area for the following outputs of spatiotemporal analysis characterising changes in the area: cross table, change index, binary change index, proportion of change areas in the total area and change map. The cross table was computed between each of the two observation periods. The diagonal elements represent the values of unchanged areas (no change in land use category), and the off-diagonal elements represent the change between categories. In addition to its telling value, the table is used to calculate other characteristics that describe the change in land use over the period.

$$A=\left(\begin{array}{ccc}{a}_{\text{1,1}}& \cdots & {a}_{1,m}\\ ⋮& \ddots & ⋮\\ {a}_{m,1}& \cdots & {a}_{m,m}\end{array}\right)$$

${r}_{i}=\sum _{j=1}^{m}{a}_{ij}$ , ${c}_{i}=\sum _{j=1}^{m}{a}_{ji}$, $\varDelta =\sum _{i=1}^{m}{a}_{ii}$, $S=\sum _{i=1}^{m}{r}_{i}=\sum _{i=1}^{m}{c}_{i}$

The change index expresses the overall intensity of land use development for a given area and period. The higher the index value was in the range of values 0–100%, the more intensive the development was for the area during the study period (Bičík et al., 2010).

$${I}_{z}=\frac{\sum _{i=1}^{m}\left|{c}_{i}-{r}_{i}\right|}{2\bullet S}\bullet 100$$

The binary change index is a global index that quantifies the magnitude of change in land use for a given area over a given period. Its value can be determined in two ways, either based on the percentage size of all areas that remained unchanged or changed during the study period or based on values from a cross table for the period (Pătru-Stupariu et al., 2011). The index ranges from − 1 (radical change) to + 1 (no change).

$$BCI=\frac{\%unchange-\%change}{\%unchange+\%change} BCI=\frac{2\varDelta }{S}-1$$

Change maps represent one of the basic cartographic outputs of spatiotemporal land use analyses, and the maps clearly show the spatial distribution of change/stable areas in the territory. They can be produced in many variations, from binary maps, where stable and change areas are marked between two-time horizons, to maps with the intersection of several time horizons with the differentiation of areas according to the number of changes, or maps with the differentiation of change areas according to which land use categories the change took place between.

For the 500 m × 500 m test area, vector models were first created for the time horizons T1 and T2 using sequential and reverse vectorization methods. The last row and column of the cross table (Table 2) show the differences in the areas of each land use category that caused the different vectorization approaches. In T1, the changes in areas are smaller, while in T2, the changes are larger overall, with the largest changes in water areas and other areas, where they amount to more than 6% of the total area of the land use category. The spatial comparison of the vector models obtained by the different vectorization methods produced a layer of residual polygons that may cause spurious changes. For the time horizon T1, this layer contained 111 polygons with a total area of more than 1700 m2, a median area of 3.6 m2 and a median thinness ratio parameter of 0.07. For the T2 time horizon, this layer contained 132 polygons with a total area of more than 6770 m2, a median area of 17.6 m2 and a median thinness ratio parameter of 0.15. The diagonal elements of the cross-tabulation show that the areas of the stable areas are larger in most cases for the reverse vectorization, so there was a reduction in false changes due to the elimination of residual polygons. The values of the indices characterising the change in the territory are very similar for both methods; the total share of change areas in the total area of the territory is smaller in the reverse vectorization, which confirms the absence of false change polygons.

Table 2

Comparison of sequential and reverse vectorization and a method for eliminating sliver polygons based on the position of a polygon within a buffer using the example of a cross table from T1 to T2
Area [hectares]		Land use category in T2
Area [hectares]		Arable land	Permanent grassland	Garden and orchard	Water area	Other area	Built up and courtyard	Totals T1
Arable land (in T1)	Sequential vectorization	13,3516	1,2929	0,4516	0,0030	0,6665	0,2323	15,9980
	Reverse vectorization	13,4856	1,3269	0,4531	0,0000	0,4854	0,2421	15,9932
	Elimination, buffer 1,5 m	13,4147	1,2709	0,4545	0,0000	0,6397	0,2278	16,0076
Permanent grassland (in T1)	Sequential vectorization	0,9511	4,4731	0,2297	0,0807	0,3268	0,7688	6,8301
	Reverse vectorization	0,9377	4,5435	0,2287	0,0617	0,2946	0,7646	6,8307
	Elimination, buffer 1,5 m	0,9272	4,5182	0,2372	0,0807	0,3032	0,7648	6,8312
Garden and orchard (in T1)	Sequential vectorization	0,0577	0,0003	0,1580	0,0000	0,0152	0,1353	0,3665
	Reverse vectorization	0,0580	0,0003	0,1543	0,0000	0,0141	0,1389	0,3655
	Elimination, buffer 1,5 m	0,0582	0,0000	0,1580	0,0000	0,0089	0,1278	0,3529
Water area (in T1)	Sequential vectorization	0,0000	0,0202	0,0000	0,2298	0,0602	0,0041	0,3144
	Reverse vectorization	0,0000	0,0000	0,0000	0,2315	0,0794	0,0003	0,3112
	Elimination, buffer 1,5 m	0,0000	0,0202	0,0000	0,2298	0,0602	0,0041	0,3144
Other area (in T1)	Sequential vectorization	0,1466	0,0099	0,0268	0,0000	0,5917	0,0973	0,8723
	Reverse vectorization	0,0766	0,0108	0,0193	0,0000	0,6892	0,0858	0,8817
	Elimination, buffer 1,5 m	0,1102	0,0000	0,0110	0,0000	0,6923	0,0826	0,8961
Built up and courtyard (in T1)	Sequential vectorization	0,0271	0,0198	0,0576	0,0000	0,0252	0,4892	0,6188
	Reverse vectorization	0,0265	0,0181	0,0500	0,0000	0,0208	0,5024	0,6178
	Elimination, buffer 1,5 m	0,0230	0,0150	0,0561	0,0000	0,0171	0,4866	0,5979
Totals T2	Sequential vectorization	14,5341	5,8162	0,9236	0,3135	1,6857	1,7270	25,0000
	Reverse vectorization	14,5842	5,8996	0,9054	0,2932	1,5835	1,7341	25,0000
	Elimination, buffer 1,5 m	14,5333	5,8243	0,9169	0,3105	1,7214	1,6936	25,0000

To refine the historical vector models and thus reduce sliver polygons, affine and rubbersheet transformations were further tested. For the affine transformation, 30 control points were selected in the test area, and both vector models were transformed with a resulting RMSE of 1.23 m for the T1 time horizon and 1.51 m for T2. The results show that there were only minor adjustments to the boundaries of the parcels but no removal of sliver polygons for the most part. The advantage of the rubbersheet transformation was the automatic search for oriented rubbersheet links that represented connections from the source locations to corresponding target locations, with three search distances (1 m, 2 m, and 3 m) chosen. For the actual rubbersheet transformation, two interpolation methods, linear and natural neighbour methods, were tested, but in this case, there were no significant differences between them. For the initial 1 m value, there were only very small shifts in polygon boundaries after transformation, so no sliver polygons were removed; for the 2 m value, there were small unwanted local deformations of polygons, especially in the built-up area where line spacing was smaller; some line features worked well, but there were overlaps in the data - topology was not preserved. For the 3 m value, more undesirable polygon changes already occur, and deformations and overlaps occur. The resulting values of the selected spatiotemporal analysis outputs for the vector models adjusted by affine and rubbersheet transformations are shown in Table 3.

The first method for identifying sliver polygons after performing overlay operations was based on the identification of these narrow polygons based on the parameter of the thinness ratio. Three parameter settings of 0.1, 0.15 and 0.20 were tested, for which the polygons that had a thinness ratio smaller than the specified value were selected. It was not possible to select polygon area as a search parameter, as sliver polygons along long linear features or long parcel boundaries can be quite large in area, so we would have to set this parameter quite high, but this could also select polygons that we do not want to eliminate, such as built-up objects. The second method of identifying sliver polygons was based on the position of these polygons within the buffer of the selected width. For testing, buffer widths of 1 m, 1.5 m, and 2 m on each side around all boundaries created by blending vector models from two-time horizons were used. The polygons within this band were designated sliver polygons. In both identification cases, selected polygons were eliminated by joining the neighbouring polygon with the longest boundary, and new spatiotemporal characteristics were calculated based on the adjusted datasets (see Table 3). For a buffer value of 1.5 m, Table 2 shows the cross-table values generated over the adjusted data with the eliminated sliver polygons.

Table 3

Comparison of selected spatiotemporal analysis outputs for different sliver polygon elimination methods
	Area with the change of land use	Change index [%]	Binary change index	Area with the change of land use	Change index [%]	Binary change index	Unchan-ged area	Area with one change of land use	Area with two changes of land use
Time period	T1-T2			T2-T3			T1-T2-T3
Sequential vectorization	22,8%	9,91	0,54	36,6%	18,53	0,27	54,7%	31,3%	14,1%
Reverse vectorization	21,6%	9,43	0,57	35,5%	19,07	0,29	55,7%	31,5%	12,8%
Affine transformation	21,9%	9,72	0,56	35,8%	18,76	0,28	55,2%	31,8%	13,0%
Rubbersheet transformation (2 m)	22,0%	10,00	0,56	35,5%	18,58	0,29	55,5%	31,4%	13,1%
Attributes rules (thinness ratio < 0,1)	22,1%	10,92	0,56	34,9%	17,42	0,30	55,8%	30,6%	13,6%
Attributes rules (thinness ratio < 0,15)	21,5%	10,64	0,57	34,5%	17,23	0,31	56,1%	30,2%	13,7%
Attributes rules (thinness ratio < 0,2)	20,9%	10,47	0,58	33,7%	17,04	0,33	57,0%	29,5%	13,6%
Position of sliver polygons (buffer 1 m)	22,5%	10,22	0,55	35,9%	18,35	0,28	55,5%	31,2%	13,4%
Position of sliver polygons (buffer 1,5 m)	22,0%	10,00	0,56	35,2%	18,59	0,3	56,4%	31,3%	12,3%
Position of sliver polygons (buffer 2 m)	20,7%	9,86	0,59	34,5%	18,49	0,31	56,2%	31,5%	12,3%

The change maps in Fig. 10 show the graphical differences between the different methods used. Part a) is the change map created from vector data obtained by sequential vectorization, sliver polygons are visible, especially around roads and boundaries of some areas, and part b) is the change map from data obtained by reverse vectorization. In part e), it can be seen that change polygons were removed in the western part of the test area; these polygons were identified based on their shape as sliver polygons but were change polygons in fact. This could have been avoided by manually inspecting selected polygons. From a graphical point of view, the last method of removing sliver polygons based on the position of the polygon within the buffer (in this case, a 1.5 m wide buffer) comes closest to the results of reverse vectorization.

Most studies using old maps as a source of data to monitor changes in land use and landscape character do not assess the accuracy of these maps and the outputs derived from them. The aim of this study was therefore to assess the positional accuracies of vector models derived from old large-scale maps with current vector data and to determine whether these positional inaccuracies have any effect on the selected results of spatiotemporal analyses. Alternatively, methods to improve these results have been proposed. All stages of spatial data processing in this study were performed in ArcGIS Pro 3.1.0.

The accuracy of historical vector models was determined by comparing the coordinates of selected control points on these models and the current vector model. A similar method was chosen in the study (Pešt'ák, J., Zimová, R., 2005), where the accuracy of objects on the maps of the first and second military mapping of Habsburg Empire was tested. The achieved mean positional errors are in the range of three to four metres for both map sources, which is very good accuracy considering the scale values of the map sources used, confirming the suitability of these maps for landscape studies. This is also shown by the results of a study (Dolejš and Forejt, 2019) that analysed articles using the stable cadastre of the 19th century as a basis for landscape research and confirmed that the stable cadastre is one of the most commonly used historical sources in Central Europe. Its possibilities are constantly expanding with ever-improving digital methods and the increasing availability of archival map data (Talich, 2020).

In the testing, two different approaches to vectorizing the time series of maps were compared. For the processing of the entire study area, the sequential vectorization method was chosen, and for testing purposes, vector models were also created in the test area using the reverse vectorization method. The reverse vectorization was evaluated as much slower than the sequential vectorization in the test area; it was necessary to decide whether it was a positional inaccuracy or a change, which, especially in the built-up area, was very difficult. This method was suitable outside built-up areas, especially for linear features such as watercourses or roads or for long parcel boundaries. The vector models produced by the two methods differed from each other in less than three percent of the areas for the T2 time horizon and in less than one percent of the areas for the T1 time horizon. There was a decrease in the overall proportion of change areas for reverse vectorization, which confirmed the elimination of false change polygons; this was also reflected in the cross table. The most significant difference was observed in the change maps, where the decrease of sliver polygons was briefly visible; thus, the map better reflected the real changes. The advantages and disadvantages of both vectorization methods are summarised in Table 4. Based on these findings, it is possible to assess at the beginning of each study which method is more appropriate for a given case.

Table 4

Advantages and disadvantages of sequential and reverse vectorization
Sequential vectorization	Reverse vectorization
Advantages
- the vector model corresponds exactly to the old map	- faster processing for simple regions with fewer polygons (the existing vector model is modified; no new model is created)
- possibility of simultaneous processing by more operators	- elimination of positional errors due to the map base and its processing
- simpler and in most cases faster to vectorize, eliminates the decision-making process	- sliver polygons are mostly eliminated, thus reducing potential false land use changes
Disadvantages
- the vector model copies the positional inaccuracy of the old map	- challenging for the operator, who assesses whether it is still an inaccuracy or already a change
- the occurrence of sliver polygons in overlay operations	- more time-consuming for complicated regions with many polygons and changes
- overestimation of changes in the territory may occur	- worse the simultaneous involvement of multiple operators, the assessment is highly subjective

In this study, several methods were proposed and tested to eliminate sliver polygons that arise from the overlaps of vector models during processing. The use of these methods is justified when there is a requirement for refinement of the results, especially for map outputs where sliver polygons are often visible at a glance. The proposed transformation methods did not have much effect on the tested area. There was a small refinement of the numerical results, and the elimination of some sliver polygons is visible in the change maps. The disadvantage of the rubbersheet transformation was the deformation of the original polygons, especially in the built-up area, where the topology was also disturbed. However, this method could find application in regions with lower polygon densities, where it worked very well for linear features and long parcel edges.

The second group of proposed methods for eliminating sliver polygons was to identify them after overlaying two vector models, based either on some parameter or position. The area of the polygon would be offered as a parameter; this step can be performed in the first stage, and the smallest polygons that reach such small area values can be identified and eliminated because they are unquestionably determined by positional inaccuracies. However, this will not eliminate sliver polygons that are along long lines and can often reach large areas (larger than the smallest mapped single building). These residual polygons have the greatest impact on the final output. The thinness ratio was chosen as a suitable parameter for polygon selection because it identifies narrow long polygons well. The selection based on this parameter works slightly less well in the built-up area, where the sliver polygons are not dominated by one dimension, and their thinness ratio is, therefore, larger; however, the influence of these polygons is not as large due to their size, or they can be identified based on their area. The last method that was tested was the selection of sliver polygons based on their position within a band generated around all polygon boundaries created by overlaying two vector models. This simple method yielded very good results; however, it was important to choose the correct buffer width to avoid identifying narrow land parcels such as roads, waterways, or buildings. For both methods, visual checking of the selected sliver polygons is advisable, if the size and structure of the area allow it, to avoid selecting undesirable parcels that are not residual polygons. The method based on the elimination of residual polygons according to certain parameters was also evaluated as the most suitable method in spatiotemporal analyses by one study (Skokanová et al., 2008), with the difference being that historical maps of medium scales were entered into the analysis.

The positional accuracy of the old large-scale maps verified in this study in several model areas reached very good values, confirming the suitability of using these maps to monitor landscape change. Old large-scale maps are a very valuable source of historical data and have their place in landscape studies, especially in research on smaller areas such as municipalities or cadastres, where they allow working at the parcel level. However, it must always be remembered that based on old maps, we are only looking at a kind of time probe into the past, and we can only predict or model the continuous development within individual periods with greater or lesser probability.

It was confirmed that the positional inconsistency of the map data could be eliminated to some extent by the chosen time-series vectorization method. Whether to choose reverse or sequential vectorization for a given study area must be considered beforehand; both methods have their advantages and disadvantages, which are summarised in the discussion of this article. In general, for small areas where large-scale maps are used, the accuracy of the results is emphasised, and the vectorization process can ideally be carried out by a single operator who will be able to correctly assess whether there is a change or positional inaccuracy; reverse vectorization is preferable. In addition, for simpler areas with fewer polygons, especially for extravillain, this method can be faster in terms of time, as there is no need to recreate the vector model. The sequential vectorization method is more often chosen because, although it accurately interprets old maps and allows easier involvement of multiple operators in the case of large regions, it has the disadvantage of creating the abovementioned sliver polygons when overlaying vector models.

To eliminate sliver polygons, any of the proposed methods can be used. The simplest and most effective method seems to be to identify sliver polygons on their position within a buffer of a given width and then eliminate them by joining them to a neighbouring polygon. The elimination of sliver polygons has been most effective for map output, where especially in the case of change maps, more accurate outputs better reflect the actual changes. By comparing the two approaches to vectorization and then the methods of removing sliver polygons, it was found that the relative positional inaccuracy of the map bases did not have a significant impact on the values of the investigated statistical parameters characterising land use change in the region. Therefore, it always depends on the type of study, the type of spatial data used, and the type of results that characterise the change in the territory, and considering these circumstances, it is necessary to think about the possibilities of whether and how the results can be refined. The findings of this study should help improve methodologies for using old large-scale maps in similar analyses of landscape evolution and clarify the effect of sliver polygons on the accuracy of the results.

Acknowledgements:

This work was supported by the Ministry of Culture of the Czech Republic through the NAKI III programme “Vltava II – transformations of the historical landscape, the river as a link and a barrier”, No. DH23P03OVV055.

Funding: This work was supported by the Ministry of Culture (Grant number DH23P03OVV055).

Competing Interests: The authors have no relevant financial or nonfinancial interests to disclose.

Author Contributions: Both authors contributed to the study's conception and design. The material preparation, data collection and analysis were performed by Darina Kratochvilova. The first draft of the manuscript was written by Darina Kratochvilova and Jiri Cajthaml, and both authors commented on previous versions of the manuscript. All the authors have read and approved the final manuscript.

Data Availability: The datasets analysed during the current study are available from the corresponding author upon reasonable request.

ArcGIS Pro – Rubbersheet Features (Editing). Documentation, 2023. https://pro.arcgis.com/en/pro-app/latest/tool-reference/editing/rubbersheet-features.htm (accessed 28 February 2024).
Badjana, H.M., Helmschrot, J., Selsam, P., Wala, K., Flügel, W.-A., Afouda, A., Akpagana, K., 2015. Land cover changes assessment using object‐based image analysis in the Binah River watershed (Togo and Benin). Earth and Space Science 2 (10), 403–416. 10.1002/2014EA000083.
Bender, O., Boehmer, H.J., Jens, D., Schumacher, K.P., 2005. Analysis of land-use change in a sector of Upper Franconia (Bavaria, Germany) since 1850 using land register records. Landscape Ecology 20 (2), 149–163. 10.1007/s10980-003-1506-7.
Bičík, I., 2010. Vývoj využití ploch v Česku. Česká geografická společnost, Praha. ISBN 978-80-904521-3-8.
Bičı́k, I., Jeleček, L., Štěpánek, V., 2001. Land-use changes and their social driving forces in Czechia in the 19th and 20th centuries. Land Use Policy 18 (1), 65–73. 10.1016/S0264-8377(00)00047-8.
Bürgi, M., Salzmann, D., Gimmi, U., 2015. 264 years of change and persistence in an agrarian landscape: a case study from the Swiss lowlands. Landscape Ecology 30 (7), 1321–1333. 10.1007/s10980-015-0189-1.
Cajthaml, J., Fialová, D., Zimová, R., 2022. Vltava: Proměny historické krajiny, První vydání ed. České vysoké učení technické v Praze, Praha, 303 stran.
Cajthaml, J., Krejčí, J., 2008. Využití starých map pro výzkum krajiny. In: Pešková Kateřina (Ed.), Sborník z 15. ročníku mezinárodního sympozia GIS Ostrava 2008, vol. 27. TANGER spol. s r.o.
Clercq, E.M. de, Clement, L., Wulf, R.R. de, 2009. Monte Carlo simulation of false change in the overlay of misregistered forest vector maps. Landscape and Urban Planning 91 (1), 36–45. 10.1016/j.landurbplan.2008.11.009.
Cousins, S.A., 2001. Analysis of land-cover transitions based on 17th and 18th century cadastral maps and aerial photographs. Landscape Ecology 16 (1), 41–54. 10.1023/A:1008108704358.
ČÚZK, 2010. Geoportál ČÚZK. https://geoportal.cuzk.cz/(accessed 28 February 2024).
Dolejš, M., Forejt, M., 2019. Franciscan Cadastre in Landscape Structure Research: A Systematic Review. Quaestiones Geographicae 38 (1), 131–144. 10.2478/quageo-2019-0013.
Eremiášová, R., Skokanová, H., 2009. Land Use Changes (Recorded in Old Maps) and Delimitation of the Most Stable Areas from the Perspective of Land Use in the Kašperské Hory Region. Journal of Landscape Ecology 2 (1). 10.2478/v10285-012-0012-5.
Frajer, J., Geletič, J., 2011. Research of historical landscape by using old maps with focus to its positional accuracy. Dela 0 (36), 49. 10.4312/dela.36.3.49-67.
Gimmi, U., Lachat, T., Bürgi, M., 2011. Reconstructing the collapse of wetland networks in the Swiss lowlands 1850–2000. Landscape Ecology 26 (8), 1071–1083. 10.1007/s10980-011-9633-z.
Hamre, L.N., Domaas, S.T., Austad, I., Rydgren, K., 2007. Land-cover and structural changes in a western Norwegian cultural landscape since 1865, based on an old cadastral map and a field survey. Landscape Ecology 22 (10), 1563–1574. 10.1007/s10980-007-9154-y.
Havlíček, M., Dostál, I., Pavelková, R., 2022. Water Reservoirs as a Driver of Anthropogenic Changes in Landscape and Transport Networks: The Czech Republic Experience. Water 14 (12), 1870. 10.3390/w14121870.
Chen, Y.-Y., Huang, W., Wang, W.-H., Juang, J.-Y., Hong, J.-S., Kato, T., Luyssaert, S., 2019. Reconstructing Taiwan's land cover changes between 1904 and 2015 from historical maps and satellite images. Sci Rep 9 (1), 3643. 10.1038/s41598-019-40063-1.
Chiang, Y.-Y., Duan, W., Leyk, S., Uhl, J.H., Knoblock, C.A., 2020. Training Deep Learning Models for Geographic Feature Recognition from Historical Maps. In: Chiang, Y.-Y., Duan, W., Leyk, S., Uhl, J.H., Knoblock, C.A. (Eds.), Using Historical Maps in Scientific Studies. Springer International Publishing, Cham, pp. 65–98.
Iosifescu, I., Tsorlini, A., Hurni, L., 2016. Towards a comprehensive methodology for automatic vectorization of raster historical maps. E-perimetron 11 (2), 57–76.
Istrate, G.-A., Istrate, V., Ursu, A., ICHIM, P., Breabăn, I.-G., 2023. Using Diachronic Cartography and GIS to Map Forest Landscape Changes in the Putna-Vrancea Natural Park, Romania. Land 12 (9), 1774. 10.3390/land12091774.
Kratochvílová, D., Cajthaml, J., 2020. Using the automatic vectorisation method in generating the vector altimetry of the historical Vltava River. Acta Polytechnika 60 (4), 303–312. 10.14311/AP.2020.60.0303.
Lathouwers, E., Segers, Y., Verstraeten, G., 2023. Reconstructing valley landscapes. GIS analyses of past land use changes in three Flemish River valleys since the late 18th century. Land Use Policy 135, 106960. 10.1016/j.landusepol.2023.106960.
Mas, J.F., 2005. Change Estimates by Map Comparison: A Method to Reduce Erroneous Changes Due to Positional Error. Transactions in GIS 9 (4), 619–629. 10.1111/j.1467-9671.2005.00238.x.
Pătru-Stupariu, I., Stupariu, M.-S., Cuculici, R., Huzui, A., 2011. Understanding landscape change using historical maps. Case study Sinaia, Romania. Journal of Maps 7 (1), 206–220. 10.4113/jom.2011.1151.
Pavelková, R., Frajer, J., Havlíček, M., Netopil, P., Rozkošný, M., David, V., Dzuráková, M., Šarapatka, B., 2016. Historical ponds of the Czech Republic: an example of the interpretation of historic maps. Journal of Maps 12 (sup1), 551–559. 10.1080/17445647.2016.1203830.
Pešťák, J., Zimová, R., 2005. Position accuracy of objects in the maps of 1st and 2nd military mapping. Kartografické listy 13.
Petit, C.C., Lambin, E.F., 2002. Impact of data integration technique on historical land-use/land-cover change: Comparing historical maps with remote sensing data in the Belgian Ardennes. Landscape Ecology 17 (2), 117–132. 10.1023/A:1016599627798.
Pindozzi, S., Cervelli, E., Capolupo, A., Okello, C., Boccia, L., 2016. Using historical maps to analyze two hundred years of land cover changes: case study of Sorrento peninsula (south Italy). Cartography and Geographic Information Science 43 (3), 250–265. 10.1080/15230406.2015.1072736.
Scăunaș, S., Păunescu, C., Merciu, G.-L., 2019. Spatial-Temporal Analysis of Land Cover and Use Changes Using GIS Tools. Case Study Băneasa Neighborhood, Bucharest. Journal of Applied Engineering Sciences 9 (2), 187–194. 10.2478/jaes-2019-0026.
Skalos, J., Engstová, B., 2010. Methodology for mapping nonforest wood elements using historic cadastral maps and aerial photographs as a basis for management. Journal of Environmental Management 91 (4), 831–843. 10.1016/j.jenvman.2009.10.013.
Skaloš, J., Engstová, B., Trpáková, I., Šantrůčková, M., Podrázský, V., 2012. Long-term changes in forest cover 1780–2007 in central Bohemia, Czech Republic. European Journal of Forest Research 131 (3), 871–884. 10.1007/s10342-011-0560-y
Skokanová, H., 2008. Metody GIS v hodnocení změn využívání krajiny. https://www.ekologie-krajiny.cz/sites/default/files/publikace-pdf/S_2008.pdf (accessed 28 February 2024).
Skokanová, H., Havlíček, M., Svoboda, J., 2008. Průběžné výsledky výzkumného záměru MSM6293359101, části kvantitativní analýza dynamiky vývoje krajiny ČR. In: Pešková Kateřina (Ed.), Sborník z 15. ročníku mezinárodního sympozia GIS Ostrava 2008. TANGER spol. s r.o.
Statuto, D., Cillis, G., Picuno, P., 2017. Using Historical Maps within a GIS to Analyze Two Centuries of Rural Landscape Changes in Southern Italy. Land 6 (3), 65. 10.3390/land6030065.
Šantrůčková M., Demková K., Weber M., Lipský Z., Dostálek J., 2017. Long term changes in water areas and wetlands in an intensively farmed landscape: A case study from the Czech Republic. European Countryside 9 (1), 132–144.
Talich, M., 2020. Bohatství starých map a jejich využití v knihovnách a dalších paměťových institucích. Knihovna : knihovnická revue 31 (2), 5–28.
Tortora, A., Statuto, D., Picuno, P., 2015. Rural landscape planning through spatial modelling and image processing of historical maps. Land Use Policy 42, 71–82. 10.1016/j.landusepol.2014.06.027.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Accuracy assessment of old large-scale maps and reducing positional error in land use change analyses

Status:

Version 1

Abstract

Figures

Introduction

Methods

Study area

Data sources and processing methodology

Assessing the positional accuracy of historical vector models

Elimination of sliver polygons

Results

The positional accuracy of historical vector models

Effect of the proposed sliver polygon elimination methods on the output of spatiotemporal analysis

Discussion

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1