Combining GIS (Geographic Information System), MFA (Material flow analysis) and BDM (Big data mining), this study proposes a model (GMB) for estimating steel stock in civil buildings. Taking Changsha urban area as the research area, estimate the stock and waste of steel in civil buildings. The specific workflow of this study is shown in Fig. 1. Firstly, we classify buildings based on the point of interest (POI) data, and determine civil buildings to extract buildings attribute. Secondly, geographic data are processed by using vector transformation, spatial connection and overlay analysis. Then the cumulative stocks of civil buildings are estimated by combining the physical properties, and the amount of waste is estimated using the life cycle function. Finally, we analyze the characteristics of the spatial and temporal distribution of steel stocks and the dynamic evolution process of their spatial differences.
2.1. Study Area
The boundary of the GMB model is selected from the urban area of Changsha. Changsha urban (Furong, Tianxin, Wangcheng, Yuelu, Yuhua, and Kaifu) cover an area of 1909.86 km2 (Statistics 2012). As an economically important city in China and the capital city of Hunan Province, Changsha has been experiencing rapid urbanization with the urban population growing from 1.12 million to 3.96 million, per capita GDP growing from 900 yuan to 144,649 yuan, the built-up area extending from 57 square kilometers to 409 square kilometers from 1985 to 2020. Given that steel buildings in Changsha mainly were built after 1985, this paper chose to estimate the annual stocks and waste of civil-building steel from 1985 to 2020.
2.2. Data source
The data acquired in this study can be classified into two categories, attribute data and spatial data. Data sources are shown in Table 1. For attribute data, we used Python to crawl the floor height, construction year, area, and neighborhood name of civil buildings on two major real estate agency websites (58. Com Inc, https://cs.58.com; Beke, https://cs.ke.com). For spatial data, geographic data of civil buildings and POI data of buildings were obtained from AutoNavi Company (https://www.amap.com), and AOI (Area of Interest) data were obtained according to the ID of POI. Based on the acquired building POI data, the buildings are classified, and data related to civil buildings are extracted from the classification results. Through this method, we obtained about 96% of the data related to civil buildings in the urban area of Changsha. Since another 4% of the data were missing, we supplemented them by nearest neighbor interpolation. However, using the above data solely is not enough to estimate the steel stocks of civil buildings, and the civil building area and steel intensity per unit area are also necessary. The steel intensity per unit area is obtained through field research and related literature review.
Table 1
Date type
|
Data Name
|
Data sources
|
Data Properties
|
Data Formats
|
Period
|
Property Data
|
Steel intensity per unit area
|
Literature reading, general standards for architectural design, civil buildings’ design codes
|
|
XSLX
|
1949–2020
|
physical properties
|
https://cs.58.com/ https://cs.ke.com/
|
|
XSLX
|
1985–2020
|
Year of construction
|
https://cs.58.com/ https://cs.ke.com/
|
|
XSLX
|
1985–2020
|
Spatial Data
|
Civil buildings’ Geographical data
|
https://www.amap.com/
|
Vectors
|
SHP
|
1985–2020
|
POI Data
|
https://www.amap.com/
|
Vectors
|
SHP
|
1985–2020
|
AOI Data
|
Combine with the POI's ID to get it in AutoNavi Company
|
Vectors
|
SHP
|
1985–2020
|
2.3. Model’s framework for estimating steel stocks in civil buildings
2.3.1. Building classification based on POI
As the study area is comprised of various types of buildings, civil and non-civil, we extracted civil buildings through POI data, and applied GIS techniques. In total, there were 23 categories of civil buildings each of them has 267 subcategories. Among them, the first level classification includes automobile service, automobile sales, catering service, government agencies and social organizations; The subcategories make a more detailed classification based on the primary classification. For example, automobile services can be divided into gas stations, other energy stations, gas stations and automobile parts sales. In addition, based on POI dataset, civil buildings are identified by applying vector transformation, spatial connection and overlay analysis.
2.3.2. Deriving Building attribute information based on big data mining
Since the collected geographical data lacks the construction year, we have developed a new tool to collect and mine real estate information, POI/AOI information, extract neighborhood name (property name) and construction year and update them to building geographical data. The specific steps are as follows.
(1) Property data (for sale and sold) were collected from the largest real estate agent websites in China. A total of 101,350 records were collected, and after preprocessing, 92,245 records were suitable for further analysis. The collected information includes title, neighborhood name (property name), area, construction year, house type, address, the total number of floors, unit price, other descriptive information, etc.
(2) Call the AutoNavi Company API interface(URL: https://restapi.amap.com/v3/place/text?parameters) to collect POI based on the keyword of neighborhood name (property name). Parameters include keywords, POI types, city, etc. where keywords were assigned to the neighborhood name (property name), and POI types are assigned to business residential | buildings | commercial and residential buildings | shopping services | malls. The returned data tag information includes ID, name, type, latitude and longitude coordinates, address, administrative district, ParentID, etc.
(3) Extract AOI. Since in most cases, the civil buildings refer to a larger area, such as neighborhoods, shopping malls, etc. the AOI can express the range more precisely. AOI is a faceted entity containing multiple classes of point-of-interest POI(Liu et al. 2021). In this paper, the AOI is obtained by the ID of the POI, and the algorithm “JsonToDataTableForAOI” (string strJson) was developed to extract the shape field value from the returned data in JSON format. If AOI data is not available, POI is used instead. When collecting into the database, to avoid repeated collection and improve the query speed, each neighborhood is hash coded by coordinates. The GeoHash algorithm was used to convert the latitude and longitude data in two-dimensional space into a string, the string is used to indicate the area block where the latitude and longitude are located.
(4) Assigning information such as year of construction to buildings. Spatially connects building vector graphics with property data, and assigns attributes such as year of construction and neighborhood name (property name) to buildings. For those buildings without AOI, we calculated the centroids of the building footprints and matched them with the nearest residential/commercial POI data. Using this method, we obtained nearly 96% of the construction year information, and then the K-nearest neighbor algorithm (KNN) method was used to complete the remaining 4% of information. We used 2-fold cross-validation in the labeled data to calculate an overall accuracy of 92% for the building year information, which is sufficient to meet the minimum standard of 85% proposed by the U.S. Geological Survey(Anderson et al. 1976).
2.3.3. Dataset of steel intensity per unit area of civil buildings
The intensity of steel use per unit area of a civil building is an essential factor in estimating steel stocks. We compiled the civil building steel intensity dataset (CBSIs) by referring to the general standards for architectural design, the design specifications for civil buildings (China 2006), and relevant literature(Izard and Müller 2010; Kozawa and Tsukihashi 2010; Yokoi et al. 2018; Yu et al. 2020; Zhu et al. 2022). Considering the steel intensity in structure, buildings can be classified into two categories. The first class covers the buildings that were built between 1949 to 2005 in which the steel concentration was between 20–40 kg/m2 and they were not tall buildings (maximum 6 floors). The second class includes those buildings that were built between 2005–2020 with the various number of floor and they were reinforced to resist against earthquake above 7 in Richter scale. Therefore, the still concentration is higher than those buildings in first class (40–75 kg/m2) (Table 2).
Table 2
Civil building steel intensity in different construction year and period division
Time
|
Floor
|
Unit steel strength(Kg/m2)
|
2006–2020
|
<11
|
40
|
>=11
|
50
|
>=17
|
60
|
>=19
|
70
|
>=28
|
75
|
1991–2005
|
|
40
|
1979–1990
|
|
35
|
1958–1978
|
|
30
|
1949–1957
|
|
20
|
2.3.4. Estimation of steel stocks in civil buildings combined GIS and bottom-up material flow analysis
This research adopts the bottom-up material flow analysis method based on the knowledge of GIS techniques. First, load the geographic data of civil buildings in QGIS 3.16, and add attribute fields such as total area and steel stock to each building. Then, select the construction year and the number of floors as the connection fields, and connect the dataset of steel use intensity per unit area of civil buildings with the building geographic data by attributes, so that each building can get the steel use intensity per unit area. Finally, combined with the bottom-up material flow analysis, the steel stock of civil buildings in Changsha urban from 1985 to 2020 is estimated. The calculation formula is shown in Eq. (1).
$$S\left(n\right)={\sum }_{i,j}\left({PS}_{i,j}\left(n\right)\times CBSI\right)$$
1
Where: S\(\left(n\right)\) is the estimated steel stocks in year (n); i is the year of construction; j is the j-th civil building; \({PS}_{i,j}\left(n\right)\) is the steel physical size (unit is m2) of the j-th civil building in year i; CBSI is the civil building steel strength per unit area built.
2.3.5. Steel waste in civil buildings based on weber distribution function
Since the life cycle of civil buildings follows the Weber distribution function(Vu et al. 2022), this study uses the Weber distribution function with different parameters to estimate steel waste in civil buildings. Therefore, the scrap steel in civil buildings should be the total stock of steel in completed civil buildings multiplied by the scrap rate of the year. The probability density distribution function of civil building life is shown in Eq. (2).
$$F\left(t\right)=1-exp\left[-{\left(\frac{t}{\alpha }\right)}^{\beta }\right]$$
2
Where: \(F\left(t\right)\) is the life cycle distribution function (φ > 0) of civil buildings; t stands for year; α is the scale parameter (control the variation of the distribution function in magnitude); β is the shape parameter (control the variation of the shape of the distribution function).
Suppose the scrapping rate of civil buildings in year n is \(\phi \left(n\right)\). Then\({ \phi }^{{\prime }}\left(n\right)\) is the amount of change in the scrapping rate of civil buildings in year n from previous year \((\phi \left(n\right)\) to \(\phi \left(n-1\right))\)(Eq. (3)).
$${\phi }^{{\prime }}\left(n\right)=exp\left[-{\left(\frac{n-1}{\beta }\right)}^{\alpha }\right]-exp\left[-{\left(\frac{n}{\beta }\right)}^{\alpha }\right]$$
3
In summary, the amount of steel waste in civil buildings in year n can be obtained, which is expressed in Eq. (4).
$$C\left(n\right)=\sum _{t=0}^{n-1}{P}_{i}\left(t\right)\times {\phi }^{{\prime }}\left(n-t\right)\times CBSI$$
4
Where: \(C\left(n\right)\) steel waste in civil buildings in the n-th year; \(P\left(t\right)\)is the civil building area of t in a certain year; \(CBSI\)is the steel intensity per unit area of civil buildings.
2.4. Analysis of the spatial and temporal distribution of steel stocks in civil buildings
Based on estimating the steel stock of civil buildings, we analyzed the spatial and temporal distribution pattern of the steel stock. Spatial autocorrelation, standard deviation ellipse, and nuclear density analysis were used to explore the aggregation characteristics and the center of gravity of steel stock. In Changsha. Moreover, the evolution trend of the density distribution of civil buildings was investigated.
2.4.1. Spatial autocorrelation analysis
Spatial autocorrelation refers to the degree of similarity between nearby observations (values). Tobler (1970) pointed out the "First Law of Geography: Everything is related to everything else, but things that are near are more related than things that are far away". The current spatial autocorrelation methods are classified into two categories: global spatial autocorrelation and local spatial autocorrelation(Chen et al. 2022). Among them, global spatial autocorrelation can be used to discuss the overall distribution and trend of an attribute. In this study, Moran's I is used to analyze the steel aggregation characteristics of civil buildings in urban areas of Changsha (Eq. (5)).
$$I=\frac{\sum _{i=1}^{n}\sum _{j=1}^{n}{w}_{ij}\left({x}_{i}-x̄\right)\left({x}_{j}-x̄\right)}{{S}^{2}\sum _{i=1}^{n}\sum _{j=1}^{n}{w}_{ij}}$$
5
Where: -1 ≤ I ≤ 1 is the global Moran index, n is the total number of geographic units in the study area, i and j denote the i-th and j-th geographic units, \({w}_{ij}\)is the element value of the spatial weight matrix, \({x}_{i}\)and \({x}_{j}\) are the attribute values of units i and j, x̄ is the average value of all unit attribute values. I > 0 is a spatial positive correlation, and it shows clustering of similar values; I < 0 is a spatial negative correlation that indicates dispersion; I = 0 shows a random spatial distribution.
Since global spatial autocorrelation applies to the entire study area, spatial autocorrelation on a local scale cannot be detected. Therefore, we chose local spatial autocorrelation to evaluate the correlation between an attribute of the steel stocks at a local level and a neighboring one. The calculation formula is as follows Eq. (6).
$${I}_{i}=\frac{{x}_{\text{i}}-\text{x}̄}{{\text{S}}^{2}}\sum _{\text{j}=1}^{\text{n}}{\text{w}}_{\text{i}\text{j}}({x}_{j}-\text{x}̄)$$
6
Where: \({I}_{i}\) is the local Moran index of each study unit, i and j denote the i-th and j-th geographical units, \({w}_{ij}\) is the elementary value of the spatial weight matrix, \({x}_{i}\) and \({x}_{j}\) are the attribute values of units i and j, and x̄ is the average of all unit attribute values, S is the number of regions.
2.4.2. Standard deviation ellipse
Standard Deviational Ellipse (SDE) is an effective analytical method proposed by Lefever in 1926 to measure spatial differences and analyze the spatial distribution characteristics of economic factors(Lefever 1926). The creation of standard deviation ellipses enables the aggregation of distribution characteristics such as central tendency, dispersion, and the directional tendency of geographic elements(Fan et al. 2021). The geographic midpoint or center of gravity is the geographic average position of all points based on each point’s weight (Eq. (7)).
$$M\left(\stackrel{-}{X,}\stackrel{-}{Y}\right)=\left[\begin{array}{cc}\frac{\sum _{i=1}^{n}{w}_{i}{x}_{i}}{\sum _{i=1}^{n}{w}_{i}} ,& \frac{\sum _{i=1}^{n}{w}_{i}{y}_{i}}{\sum _{i=1}^{n}{w}_{i}}\end{array}\right]$$
7
Where: \(M\left(\stackrel{-}{X,}\stackrel{-}{Y}\right)\) denotes the center of gravity of the regions(points); n is the number of regions; \({w}_{i}\) is the weight; (\({x}_{i}\),\({y}_{i}\))denotes the coordinate of i-th element (i-th building).
The rotation angle θ is the angle formed by the clockwise rotation of the positive north direction to the long axis of the ellipse, which indicates the direction of the main trend of the spatial distribution of the elements (Eq. (8)).
\(\text{tan}\theta =\frac{A+B}{C}\)
(8)
\(A=\sum _{i=1}^{n}\tilde{{x}_{i}^{2}}-\sum _{i=1}^{n}\tilde{{y}_{i}^{2}}\)
(9)
\(B=\sqrt{{\left(\sum _{i=1}^{n}\tilde{{x}_{i}^{2}}-\sum _{i=1}^{n}\tilde{{y}_{i}^{2}}\right)}^{2}+4{\left(\sum _{i=1}^{n}\tilde{{x}_{i}^{2}}\tilde{{y}_{i}^{2}}\right)}^{2}}\)
(10)
\(C=2\sum _{i=1}^{n}\tilde{{x}_{i}}\tilde{{y}_{i}}\)
(11)
where \(\tilde{{x}_{i}}\) and \(\tilde{{y}_{i}}\) are the deviations of the i-th subset of regional coordinates from the center of gravity, and A, B, and C denote the long semi-axis, short semi-axis, focal point coordinates of the ellipse when the center of the ellipse is the origin. The long and short semi-axes reflect the dispersion of the elements in the primary and secondary directions, respectively. We use the standard deviation ellipse to analyze the spatial distribution of iron and steel stocks in Changsha and the center of gravity migration trend.
2.4.3. Nuclear density analysis
Kernel density analysis (KDA) is one of the first nonparametric tests to estimate unknown density functions in probability theory(Zhang and Wang 2021). It is an expression of aggregation intensity describing the spatial relationship between points and neighboring points about spatial density. By setting each known data point as the center, the kernel density function can calculate the density contribution value of each known point in each cell within the specified range, which can visually reflect the distribution characteristics of discrete values in the region (Li et al. 2021; Zhang et al. 2022). Calculating each element point in the area and analyzing the superposition of the contribution density at the same location, the final distribution density of the element in the whole area is obtained Eq. (12).
$$fn\left(x\right)=\frac{1}{nh}\sum _{i=1}^{n}k\left(\frac{x-{x}_{i}}{h}\right)$$
12
Where \(fn\left(x\right)\) is the kernel density function; \(k\frac{x-{x}_{i}}{\left(h\right)}\) is the kernel function; n is the number of known points; h is the finding radius; \(x-{x}_{i}\)is the distance from the raster centroid to the known point.