Research Objectives
1. Predicting the impact of physical, climatic and traffic factors on bridge damage
2. Prioritizing the effective factors in the critical damages of bridges
Statistical Population
The statistical population of this research includes all concrete road bridges of 5 main axes of Zanjan province with a length of above 5 meters, including 384 bridges.
Data collection method
In this study, the researchers utilized the databases of Zanjan Road Maintenance and Transportation Organization, as well as the statistical yearbook of the Meteorological Organization for the year 2019. These sources were selected due to their relevance to the study's type, purpose, subject, and the characteristics of the statistical population under investigation. The information gathered from these databases played a crucial role in conducting the research.
Model inputs and outputs
In the previous sections, it was mentioned that the measurement of bridge damage is based on the Iranian model, which incorporates structural damage and operation indicators. These indicators are obtained through the observations of inspectors and calculations, serving as the variables in the model. When selecting model inputs, the findings of other researchers such as Morcous, Li, and Lu, who have studied factors influencing the damage of concrete bridges, have been utilized. Naturally, among the numerous factors, only those for which data could be collected and validated in the statistical population have been considered.
Consequently, three sets of data can be employed to determine the efficiency of bridges based on the output of structural damage and operation indicators. These three groups consist of climatic factors (e.g., number and amount of rainy and snowy days, number of frosty days, average daily temperature difference, intensity and acidity of rain and snow, salt concentration used in snow clearance and de-icing operations), traffic factors (e.g., volume of traffic, heavy vehicle traffic, tonnage crossing the bridge, speed of crossing the bridge, number of vehicle collisions with the bridge deck and supports), and maintenance factors (e.g., frequency and types of cross-sectional and major repairs, maintenance cost). Additionally, physical factors such as bridge length, largest span length, bridge width, and bridge height are taken into account.
Unfortunately, there was insufficient information regarding accidents that occurred on bridges, as well as maintenance records and related costs in the provincial road maintenance and transportation organizations. As a result, data from the provincial road accident registration system (Provincial Roads Management Center), intelligent road traffic system, and meteorological systems in the area of bridges, along with important physical characteristics affecting structural damage, were extracted and used as model data after validation and correction. In the following section, we will examine the inputs and outputs utilized in the model. (Table 2)
a. Model input
Meteorological and Climatic Information:
(Sources: Reports from the General Directorate of Meteorology of Zanjan Province and the Road Incident Registration System of the Road Maintenance and Transportation Organization of Iran)
Input 1: Number of Axial Snowfalls
- Definition: The number of times snowfall has been reported on the mentioned axis section during the year, based on reports from the Road Management Center.
Input 2: Number of Axial Rains
- Definition: The number of times rain has been reported on the mentioned axis section during the year, based on reports from the Road Management Center.
Input 3: Amount of Rainfall (ml)
- Definition: The total amount of rain and snowfall on the mentioned axis section during the year, according to reports from the Road Management Center.
Input 4: Number of Frosty Days
- Definition: The number of days with frost in the bridge area, based on reports from meteorological stations near the bridge.
Input 5: Number of Days of Snowfall
- Definition: The number of days with snowfall in the bridge area, according to reports from meteorological stations near the bridge.
Input 6: Average Snowfall/Amount of Rainfall (ml)
- Definition: The average snowfall or rainfall in the bridge area, based on reports from meteorological stations near the bridge.
Input 7: Average Temperature Difference
- Definition: The average temperature difference between day and night in the bridge area, according to reports from meteorological stations near the bridge.
Traffic Information:
(Source: Traffic Counting System of the Road Maintenance and Transportation Organization of Iran)
Input 8: Average Daily Traffic (ADT)
- Definition: The average daily round-trip traffic on the axis, based on information from traffic counts installed on the axis.
Input 9: Percentage of Heavy Vehicles
- Definition: The percentage of heavy vehicle traffic (including trucks, trailers, and buses) in relation to the total traffic of vehicles on the axis, based on information from traffic counts installed on the axis.
Input 10: Average Speed (km/h)
- Definition: The average speed of vehicles traveling on the axis, based on information from traffic counts installed on the axis.
Physical Information of Bridges:
(Source: Bridge Management System of the Road Maintenance and Transportation Organization of Iran)
Input 11: Length of the Bridge
b. Model Outputs
(Source: Bridge Management System of the Road Maintenance and Transportation Organization of Iran)
Output 1: Bridge Structural Damage Index
- Definition: A measure of the structural damage of the bridge, determined based on the damage status of each bridge element, application of member impact coefficients, and the degree of damage.
Table 2
Dependent Variable
|
Description
|
Code
|
|
|
Structural damage index
|
O1
|
|
|
Dependent Variables
|
Meteorology
|
Description
|
Code
|
Description
|
Code
|
Description
|
Code
|
Number of axial snowfalls
|
I1
|
Number of axial rains
|
I2
|
Amount of rainfall (ml)
|
I3
|
Number of frosty days
|
I4
|
Number of snowfall days
|
I5
|
Average rainfalls
|
I6
|
Average temperature difference
|
I7
|
Traffic
|
Average daily traffic (ADT)
|
I8
|
Heavy vehicle percentage
|
I9
|
|
|
|
|
Physical
|
Bridge length
|
I11
|
|
|
|
|
|
|
Information Analysis Method
In this study, our objective is to prioritize the factors that affect the structural damage of bridges and predict their degree of importance and destructive scenarios. To achieve this, we have employed the decision tree method. This method allows us to select the independent variables, which include the physical characteristics of the bridges as well as atmospheric, climatic, and traffic conditions. The dependent variables are represented by structural damage indicators based on the Iranian BMS model.
Before entering the model, we performed data preparation operations to ensure that the data is suitable for analysis in Rapid Miner 5.1 modeling software. It is important to note that any errors or inaccuracies in the data preparation process can lead to incorrect outputs or inaccurate predictions. The changes made to the model are presented in Table 3, while Fig. 2 depicts the model created in the software.
To evaluate the model's performance, we selected 10% of the statistical population data as test data using the sample command. The remaining data was used as training data, with missing values being substituted using an appropriate method. Data preparation was conducted using Excel software before loading it into the model.
In this model, we discretized the dependent variable of the training data based on frequency. Discretization involves converting numerical data into nominal data by dividing the numerical values of a particular attribute into several intervals.
Table 3
Characteristics of model variables
Role
|
Name
|
Type
|
Statistics
|
Range
|
Missings
|
prediction
|
code
|
polynominal
|
Mode = a523 (1), least = a4 (0)
|
a109(1), a147 (1), a170(1), a17
|
0
|
regular
|
Structural damage index
|
numeric
|
avg = 22.894 +/- 24.573
|
[0.000 ; 97.200]
|
0
|
regular
|
Bridge length
|
numeric
|
avg = 13.634 +/- 21.661
|
[5.000 ; 136.000]
|
0
|
regular
|
Axial snow
|
integer
|
avg = 9.974 +/- 5.504
|
[2.000 ; 19.000]
|
0
|
regular
|
Axial rain
|
integer
|
avg = 11.579 +/- 7.195
|
[2.000 ; 25.000]
|
0
|
regular
|
Rainfall
|
integer
|
avg = 357.053 +/- 46.303
|
[277.000 ; 392.000]
|
0
|
regular
|
Number of frosty days
|
real
|
avg = 103.605 +/- 14.444
|
[80.000 ; 116.000]
|
0
|
regular
|
Number of snowfall days
|
real
|
avg = 35.500 +/- 7.059
|
[26.000 ; 43.000]
|
0
|
regular
|
Rainfall average
|
integer
|
avg = 0.979 +/- 0.128
|
[0.760 ; 1.070]
|
0
|
regular
|
Average temperature difference
|
numeric
|
avg = 13.797 +/- 1.032
|
[12.400 ; 14.800]
|
0
|
Data Analysis
There are several methods available for implementing and executing data mining projects, and one of the most powerful methodologies is the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. In this paper, we present a proposed CRISP-based model that comprises five phases, each consisting of sub-sections. It is important to note that moving forward and backward between different phases is necessary, as the input for each phase depends on the output of the previous phase (Al Jarullah 2011). In the following sections, we will examine each phase of this proposed model in accordance with the case study.
During this phase, the first step involves recognizing the desired system. Subsequently, the goals and key success factors of the system are determined and reviewed. Table 4 provides an overview of the model variables and their properties. The code variable is defined in the I.D. role, while the structural damage index variable is defined in the prediction role. Since there is no missing data in this model, there is no need for any data modifications.
b. Data Collection and Recognition:
In this phase, raw data was collected from Iran's bridge management system reports. The collected data was then validated and normalized using M.S. EXCEL software. Table 4 presents a portion of the raw bridge information prior to the preparation process.
Table 4
Part of the raw information of bridges
Code
|
Bridge length
|
Axial snow
|
Axial rain
|
rainfall
|
Number of frosty days
|
Number of snowfall days
|
Rainfall average
|
Average temperature difference
|
Average daily traffic (ADT)
|
Heavy vehicle percentage
|
a1
|
16
|
10
|
24
|
385
|
116
|
43
|
1.06
|
14.8
|
2181
|
21.5
|
a10
|
10
|
3
|
3
|
277
|
80
|
28
|
0.76
|
13.1
|
1729
|
39
|
a100
|
14
|
3
|
3
|
277
|
80
|
28
|
0.76
|
13.1
|
1729
|
39
|
a101
|
6
|
10
|
23
|
385
|
116
|
43
|
1.06
|
14.8
|
2181
|
21.5
|
a102
|
85
|
6
|
12
|
316
|
112
|
26
|
0.87
|
13.2
|
12137
|
31.42
|
a103
|
10
|
3
|
3
|
277
|
80
|
28
|
0.76
|
13.1
|
1729
|
39
|
a104
|
7
|
3
|
3
|
277
|
80
|
28
|
0.76
|
13.1
|
1729
|
39
|
a105
|
8
|
2
|
3
|
295
|
110
|
34
|
0.8
|
14.3
|
7506
|
9.5
|
a106
|
12
|
16
|
17
|
392
|
91
|
30
|
1.07
|
12.4
|
15736
|
24.63
|
a107
|
12
|
2
|
2
|
277
|
80
|
28
|
0.76
|
13.1
|
1729
|
39
|
a108
|
6
|
3
|
2
|
277
|
80
|
28
|
0.76
|
13.1
|
1729
|
39
|
a109
|
5
|
12
|
8
|
392
|
91
|
30
|
1.07
|
12.4
|
2536
|
27
|
a11
|
44
|
6
|
5
|
385
|
116
|
43
|
1.06
|
14.8
|
2536
|
27
|
a110
|
9
|
10
|
23
|
385
|
116
|
43
|
1.06
|
14.8
|
2181
|
21.5
|
a111
|
6
|
3
|
4
|
277
|
80
|
28
|
0.76
|
13.1
|
1729
|
39
|
a112
|
13
|
3
|
3
|
277
|
80
|
28
|
0.76
|
13.1
|
1729
|
39
|
a113
|
13
|
3
|
5
|
295
|
110
|
34
|
0.8
|
14.3
|
7506
|
9.5
|
c. Modelling
The methodology used in this study is predictive data mining, specifically employing the decision tree algorithm to establish relationships between different variables. In this stage, the C5.0 decision tree algorithm was tested with various inputs. A decision tree provides a clear representation of branching using an algorithm. Each leaf node in the tree represents a subset of the training data, with each instance belonging to a specific leaf node. (Witten and Frank 2005)
The modeling was conducted using Rapid Miner 5.2 software. The model consists of two branches: "entering and assigning roles to training data and setting analysis components" and "entering test data" (Fig. 2). The structural damage index, which serves as the prediction data, is divided into five ranges proposed by the software. Domain 5 represents bridges in very poor condition, while domain 1 indicates bridges in a safe condition with minor damage (Table 4).
The Decision Tree generated by the software determines the range of the structural damage index based on different values of the independent variables (Fig. 3). According to this tree, the worst scenario for bridge damage, which corresponds to Range 3 (damage index ranging from 11.976 to 167.962), occurs under the following conditions:
-
The axis experiences more than 13.5 days of snowfall per year, the average daily traffic is less than 13,936 vehicles, and the percentage of heavy vehicles in the total traffic exceeds 15.5%.
-
The axis experiences more than 23 days of rainfall per year, more than 7 days of snowfall per year, the average traffic is less than 2,358 vehicles per day, the region has more than 27 snowy days per year, the annual rainfall exceeds 388 mm, and the percentage of heavy vehicle traffic is more than 19.5% of the total traffic.
Different scenarios can be derived from the decision tree for the high (Range 2) and medium (Range 1) damage index ranges as well. One important observation is the inverse relationship between the daily traffic index and the structural damage index. This contradicts the commonly held belief among bridge experts that higher traffic volume and weight lead to increased fatigue and damage to the bridge structure over its lifespan. This contradiction could be attributed to issues with the formulas or model tables used, or it could indicate errors in the observation and diagnosis of damages or recording in the software.
Furthermore, the decision tree presented does not mention independent variables such as freezing days, average rainfall, rainfall in the axis, temperature difference, and bridge length. This suggests that either these variables have little or no significant relationship with the structural damage index, or they are considered less important by the software compared to other variables and have been excluded from the decision tree. In other words, variables such as snowfall and heavy rain amounts, the percentage of heavy traffic compared to total traffic, the number of snowy days, and the daily traffic volume have a more significant impact on bridge damage, particularly the percentage of heavy vehicles in relation to total traffic.
d. Evaluation
In this phase, it is crucial to evaluate the results of the modeling process to ensure the model's effectiveness and usability. To assess the accuracy of the model, the available data is divided into three parts: training, testing, and validation. The training data is used to build the model, while the testing data is employed to evaluate the model's performance by determining the labels for the test records. The validation data further verifies the accuracy of the model.
In this particular model, 10% of the statistical population data was assigned as the test data using the sample command, while the remaining data was used as the training data with the replacement method. Prior to entering the model, data preparation was conducted using Excel software and then loaded into the model. Additionally, the dependent variable of educational data underwent discretization based on frequency, which involves converting numerical data into nominal data by dividing the numerical values into intervals.
The accuracy and precision of the model on the training data set are measured as the percentage of correctly classified observations. This is calculated using the test data and the error rate. The misclassification rate can also be derived from the accuracy index. In order to calculate these indices, several definitions need to be considered. "True positive" refers to positive data that is correctly predicted by the algorithm, while "true negative" represents negative data that is correctly predicted. "False positive" indicates negative data that is incorrectly predicted, and "negative errors" are positive data that are inaccurately predicted. (Han and Kamber 2006)
For evaluating the algorithm's performance in this paper, the precision index is utilized. This index signifies the correct prediction rate of the classification algorithm. Based on the results of the model (Fig. 4), the model exhibits an error rate of 3.66% and an accuracy of 96.38%. Therefore, the model's validity can be considered acceptable.
e. Development
Creating a model is not the end of a project; the purpose of data mining projects is to discover knowledge and utilize the knowledge in the future. The discovered knowledge needs to be organized and made accessible to others. In the first step, we utilized the Apply Model operator in Rapid Miner software and the Gini-index method to predict the condition of the structural damage index of bridges based on data mining performed on the entire dataset (Fig. 2).
Given that the model focuses on the special feature of the structural damage index, which has 5 ranges based on the tested bridges' damage indexes, Fig. 5 shows the software's display of the probability for each bridge to fall within these 5 ranges and predicts the most likely range. For example, for the bridge with code a23, the software predicts a 94% probability for it to be in damage index range 1 and a 6% probability for range 2.
Another advantage of Rapid Miner software in developing data mining models is the creation of scatter plots. These plots provide valuable insights for researchers to study the distribution of bridges based on the factors that influence the structural damage index. In Fig. 6, the scatter plot shows that bridges in the middle 1 range (approximately 25) have a percentage of heavy vehicles around 30%. This figure helps identify situations where certain indicators can increase the risk of damage.