Historical road networks modeling
We assume that the evolution of road networks is largely characterized by expansion over time, and, to a lesser degree, by densification. Changes in the geometric structure (e.g., layout, orientation) of road networks, or shrinkage are rare, and are assumed to be negligible in the case of the US during our study period. Thus, multi-temporal spatial data measuring the expansion of developed, or built-up land over time is commonly used to spatially constrain contemporary road networks to their assumed historical extents, under the assumption that the time period of earliest settlement roughly corresponds to the time period when nearby roads have been constructed. This strategy has been employed in related work analyzing road networks over time and at large spatial extents, using multi-temporal census enumeration units including rural-urban classifications (3), cadastral parcel data containing built-year information (4), multi-temporal remote-sensing derived urban extents (5), or width of residential streets over time (38). Other studies focusing on local scales have extracted multi-temporal road network data from historical maps using computer vision (e.g. (10)).
Herein, we use geospatial vector data from the United States Geological Survey (USGS) National Transportation Dataset (14), representing the US road network in approximately 2018. In order to model retrospective extents of built-up land, we use built-up areas from the Historical Settlement Data Compilation for the U.S. (HISDAC-US; (15, 16)), which are derived from parcel-level built-year information contained in Zillow’s Transaction and Assessment Database (ZTRAX; (39)). These built-up areas (BUA) are available in 5-year intervals from 1810 to 2016, as a series of binary, gridded surfaces at a resolution of 250m ((40), Supplementary Fig. S2a). Likewise, historical estimates of the number of buildings (built-up property locations; BUPL, Supplementary Fig. S2b) per grid cell are available in the HISDAC-US repository (41). While HISDAC-US data coverage is sparse in some rural areas of the US, geographic coverage and temporal information is largely complete in urban regions (15). Moreover, the accuracy of the built-up extents layer increases over time (15, 16) reaching acceptable levels after 1900 (15).
Based on the gridded surfaces from the HISDAC-US we developed an approach to generate spatially generalized urban extents, consistent across different cities and over time. In a first step, we generated a built-up density surface in the starting year (i.e., 1910) within each 2015 CBSA boundary, using circular focal windows of radius r = 1 kilometer, containing the proportion of built-up area within the focal neighborhood. We then selected all grid cells with a focal built-up density greater than 5%. This method has previously been employed to discretize the rural-urban continuum into high density (urban) and lower density (periurban) strata (42) and shows high discriminative power between signals in remotely sensed spectral responses. For each CBSA and year, we then segmented the resulting contiguous groups (“patches”) of urban grid cells and computed the sums of built-up area, building indoor area, and number of buildings per patch, and computed the percentile ranks of the patches within a CBSA according to the number of buildings they contain. We discarded small patches containing only a few buildings, likely representing scattered periurban settlements. To do so, we retained only patches that exceed the 90th percentile in the first year when the density filtering yields at least one patch of built-up land (which may be later than 1910 for late-developing cities). This way, we ensure that urban areas are modelled based on consistent criteria across space and time, and represented by smooth, contiguous, and largely gap-free areas (Supplementary Fig. S2c). We then clipped the NTD road vector data to the urban delineations in each year, yielding sub-networks that can be uniquely identified by the combination of CBSA, year, and patch identifier. Lastly, we identified the coordinates of the end points of the road vector lines that were introduced by the clipping, and recorded these locations for subsequent node analysis, since these nodes represent artificial cul-de-sacs introduced by the data processing and need to be excluded from subsequent statistical analysis (Supplementary Fig. S2d). In total, we analyzed 8.0 million nodes and 10.1 million edges within CBSAs.
CONUS-wide historical road network modeling
To derive trends of road network characteristics over time and across the rural-urban continuum, we used the NTD road network vector data (14, 33) and the first built-up year dataset (FBUY), available as a gridded dataset from the HISDAC-US data repository (15,16) (temporal and geographic coverage of these data are shown in Supplementary Figs. S3 & S4). We first identified grid cells developed within moving windows (time periods) of 40 years, in steps of 20 years, e.g., developed prior to 1900, 1880-1920, 1900-1940, 1920-1960, etc. For each county in the CONUS, we then extracted the road network vector objects within the areas corresponding to each 40-year development period and assigned an individual identifier to each contiguous group of developed grid cells (patches). We removed small, spatially isolated patches of under 0.31 square kilometers (corresponding to five 250 by 250 meter grid cells), as well as elongated patches of less than 500 meter width, likely representing settlements along highways and thus not relevant to characterize road networks in cities, towns, or places (see Supplementary Figs. S1 & S2). For the remaining patches, containing over 27 million road segments, we calculated a range of road network metrics, aggregated per county and time period. Based on the county-level rural-urban continuum code provided by the US Department of Agriculture (33), classifying each county into one of nine levels of “rurality”, we analyzed each of the network metrics in a bi-variate manner over time and across the rural-urban continuum, stratified by US census region (24) (see RUCC boundaries in Supplementary Fig. S5). In total, we analyzed 9.2 million nodes and 15.2 million edges across the rural-urban continuum.
Metropolitan-level historical road network modeling and statistical analysis
To construct CBSA-level analyses of the road networks, we group the patches constructed above into CBSA regions. If a patch is on the border of a CBSA region, we cut it off at the boundary, and edges that reach the boundary are removed; this is not common but is a reasonable way of defining patches associated with only one metropolitan or micropolitan area. At ten-year intervals between 1900 and 2010, as well as at 2015, we construct the road network topology using the Python library networkx, and remove nodes with degree two from the analysis, which we explain in more detail in the Supplementary Fig. S22 For each patch, we record the total road length, area, degree, proportion deadends and degree greater than or equal to four, as well as local entropy and griddedness. These raw statistics are used to construct combined measures, e.g., the distance per unit area to be able to quantify the road length distance within all patch areas (which are a small proportion of the total CBSA area). Both CONUS-wide and CBSA level historical road networks were extracted using ESRI ArcPy (43), Safe Software Feature Manipulation Engine Desktop (44).
Grid-cell-level correlation analysis and time series clustering
Using the FBUY gridded surface from HISDAC-US (15, 16), we calculated the average settlement age within 1 × 1 kilometer grid cells located within the 2015 urban delineations de rived from the density-based delineation method described above using GeoPandas (45) and SciPy (46) Python modules. The aggregation to 1 × 1 kilometer grid cells aims to avoid small sample sizes of road segments and intersections per grid cell, and thus, ensures the statistical support for the network statistics calculated per grid cell. We then identified all network nodes within a grid cell, as well as the centroids of all road segments (i.e., network edges) per grid cell. Based on the network statistics attributed to each node (e.g., nodal degree, local griddedness) and to each edge (e.g., azimuth, road segment length) we calculated road network statistics for each grid cell, such as average degree or total road length (see Fig. 2 in the main text). In total, we calculated seven grid cell-level network statistics (Fig. 4 in the main text and Supplementary Fig. S7). Note that due to potentially small sample sizes within grid cells, we replaced the orientation entropy by the variety of unique azimuth values per grid cell, calculated after discretizing the road segment azimuth into bins of 10 degrees. Based on the gridded surface indicating the average age per grid cell, and corresponding cell-level network statistics within each CBSA, we extracted cell-by-cell pairs of settlement age and road network statistics for each city. These vectors enable us to calculate correlations between age and network characteristics, for each city, considering the local, fine-grained variability of settlement age as it is associated to the characteristics of the road network. Moreover, we generated time series of each network characteristic for each city. In order to characterize the relative relationship between age and network characteristics, we discretized the age surface per CBSA into deciles. Thus, the resulting time series consist of the same number of observations (i.e., ten) and are independent from the absolute age of the cities. For each of the network characteristics, we conducted time series-based cluster analysis separately for MSAs and µSA. We used the time-series k-means algorithm (TSK-means) (25) in conjunction with the Dynamic Time Warping (47) similarity metric to characterize the dissimilarity between time series, implemented in tslearn (48) Python module. In order to identify the optimum number of clusters k, we calculated the cluster inertia based on DTW similarity as a measure of separation between time series clusters for a range of k from 2 to 20 and used the commonly used elbow method (49) to identify the approximate number of clusters for each scenario. We normalized the cluster inertia of each clustering scenario into the range [0, 1] in order to compare the cluster quality across the different network statistics (Fig. 4d in the main text).
Road Statistics
Griddedness has become critical to understanding traffic inefficiency and related problems in cities but measuring it has been difficult until recently. Namely, grid-like road networks ap pear to enhance walkability and lower relative vehicular travel in a city (3, 19). Griddedness (and related urban sprawl) metrics have been defined and implemented on several recent occasions (3, 5, 34). The methods to derive such metrics are sophisticated (3) and sometimes computationally expensive (5, 34). However, we aim for a simple, intersection-level measure to extend on previous work.
We develop the local griddedness metric, a spatial complement to the clustering coefficient often used in network analysis (2, 23). The local clustering coefficient of a node is defined as the proportion of triangles that exist whose vertex includes that node relative to the total number of possible triangles that could exist for a node of that degree. Unlike, e.g., social networks, road networks tend to have four-cycles, and more uniquely still, these cycles tend to be planar, meaning they all lie on a two-dimensional plane.
These constraints offer guidance to a unique spatial clustering coefficient, local griddedness, which is the proportion of four-cycles containing that vertex relative to the total number of planar four-cycles for a node with that degree. Degree one, two, and three nodes are common and special cases for intersections, however. If a node is the end of a dead-end road, we define the local griddedness to be zero. We do not analyze nodes of degree two, as we explain in more detail in the SI. Finally, it is unlikely for a three-road intersection to have three city blocks meet there, but more likely is that it ends in a T. The maximum number of city blocks is therefore defined as two and is otherwise k. While in most cases, local griddedness is between 0 and 1, we allow for rare instances in which, for example, degree-three nodes have a value up to 3/2 (a “super gridded” node), in order for the T intersection to have a natural griddedness value of 1.0. This also means that roads that violate this planar assumption (e.g., those with bridges) may be greater than 1, but such instances are rare. Using this measure, any node can have its griddedness value rapidly calculated, allowing for extremely fine-grained analysis of road network spatial statistics.
In addition to this metric, we separately calculated azimuth variety and orientation entropy by binning the angles between road intersections into six-degree wide bins. Changing this width does not qualitatively change our findings but can change the absolute value of entropy (which can be as high as the log of the number of bins) as well as the azimuth variety.