Proposing Several Model Techniques Including ANN and M5P-tree to Forecast the Stress at the Failure of Geopolymer Concrete Mixtures Incorporated Nanosilica

Geopolymers are innovative cementitious materials that can completely replace traditional Portland cement composites and have a lower carbon footprint than Portland cement. Recent efforts have been made to incorporate various nanomaterials, most notably nano-silica (nS), into geopolymer concrete (GPC) to improve the composite's properties and performance. Compression strength (CS) is one of the essential properties of all types of concrete composites, including geopolymer concrete. As a result, creating a credible model for forecasting concrete CS is critical for saving time, energy, and money, as well as providing guidance for scheduling the construction process and removing formworks. This paper presents a large amount of mixed design data correlated to mechanical strength using empirical correlations and neural networks. Several models, including articial neural network, M5P-tree, linear regression, nonlinear regression, and multilogistic regression models were utilized to create models for forecasting the CS of GPC incorporated nS. In this case, about 207 tested CS values were collected from literature studies and then analyzed to promote the models. For the rst time, eleven effective variables were employed as input model parameters during the modeling process, including the alkaline solution to binder ratio, binder content, ne and coarse aggregate content, NaOH and Na 2 SiO 3 content, Na 2 SiO 3 /NaOH ratio, molarity, nS content, curing temperatures, and ages. The developed models were assessed using different statistical tools such as RMSE, MAE, SI, OBJ value, and R 2 . Results revealed that the ANN model estimated the CS of GPC incorporated nS more accurately than the other models. On the other hand, the alkaline solution to binder ratio, molarity, NaOH content, curing temperature, and ages were those parameters that have signicant inuences on the CS of GPC incorporated nS.

structure, the effects of eleven variables such as the alkaline solution to binder ratio (l/b), binder content (b), ne (FA) and coarse (CA) aggregate content, sodium hydroxide (SH) and sodium silicate (SS) content, the ratio of SS/SH, the molarity of SH (M), nS content, curing temperatures (T), and ages (A) were considered and quanti ed on the CS of GPC incorporated nS by using different model techniques, namely arti cial neural network (ANN), M5P-tree (M5P), linear regression (LR), nonlinear regression (NLR), and multi logistic (MLR) models. Finally, different statistical tools, such as the root mean squared error (RMSE), mean absolute error (MAE), scatter index (SI), OBJ value, and the coe cient of determination (R 2 ), were used to evaluate the created models' accuracy.

Research Signi cance
The primary goal of this paper is to create multiscale models for estimating the CS of GPC incorporated nS. Thus, a diverse range of laboratory work data, approximately 207 tested specimens with a variety of l/b, b, FA, CA, SH, SS, M, SS/SH, nS, T, and ana A, were collected and reviewed using a variety of analytical approaches with the goal of: (i) providing most effortless equations to be used by practicing engineers and scholars in their GPC mix design works; (ii) clarifying the effects of each mix proportion parameters and curing temperatures and ages on the CS of GPC incorporated nS; (iii) to quantify and offer systematic multiscale models for forecasting the CS of GPC utilizing eleven variable input parameters; (iv) using statistical assessment methods such as MAE, RMSE, R 2 , OBJ, and SI to nd the most authoritative model to forecast the CS of GPC composites incorporated nS from various model strategies (LR, NLR, MLR, ANN, and M5P).

Methodology
The authors conducted an extensive search of several databases, including Research Gate, Science Direct, Google Scholar, Scopus, and the web of science. A wealth of papers was discovered discussing the effect of various NP types on the properties of geopolymer paste composites. However, a limited number of documents were found regarding the impact of NPs on the properties of GPC composites. Totally, 207 datasets of the CS values were obtained. In the literature, a wide range of NPs like nS, nC, nA, CNT, nM, nT were consumed to improve various properties of the GPC composites, with nS being the most frequent, as can be seen in Table 1. Therefore, in this study, the authors take those articles that used nS to improve various properties of the GPC composites.
In the modeling process, eleven input parameters were used, limiting the authors' ability to utilize a greater number of data in the created models. The gathered datasets were statistically analyzed and classi ed into three groups. The models were built using the larger group, which included 135 datasets. The second group is made up of 36 datasets that were used to test the created models, and the nal group is made up of 36 datasets that were consumed to validate the suggested models (Golafshani et al., 2020;Faraj et al., 2021). Table 2 shows the dataset ranges, including all signi cant parameters and the observed CS of the GPC incorporated nS. The input dataset contains the following values: l/b ranges from 0.4 to 0.4, b ranges from 300-500 kg/m 3 , FA ranges from 490-990 kg/m 3 , CA ranges from 810-1470 kg/m 3 , SH ranges from 18.17-159.75 kg/m 3 , SS ranges from 40.8-187.5 kg/m 3 , M ranges from 4-16 M, SS/SH ranges from 0.33-3, nS ranges from 0-60 kg/m 3 , T ranges from 23-70°C, A ranges from 0.5-180 days, and CS ranges from 3.2-81.3 MPa. The previous datasets were used to propose various models such as LR, NLR, MLR, ANN, and M5P to estimate the CS of GPC incorporated nS; then, the developed models were evaluated using statistical criteria such as R 2 , RMSE, MAE, SI, and OBJ to determine the most reliable and accurate model. Additional details about this work's methodology are shortened in a ow chart, as illustrated in Figure 1

Statistical Assessment
Su cient information about each variable input model parameter is provided in the following sections through 4.1 to 4.12.

Alkaline solution to binder ratio (l/b)
Based on the collected datasets, the ratio of l/b of the GPC mixtures modi ed with nS was in the range of 0.4 to 0.6, with the average and standard deviations of 0.49 and 0.05, respectively. Also, regarding other statistical analyses, it was found that the variance was 0.002, skewness was 0.66, and the kurtosis was -0.25. Figure 2 depicts the relationship between CS and l/b with histograms of GPC mixtures incorporated nS.

Binder content (b)
According to Table 1, F, GGBFS, MK, SF, RHA, and NP are those ashes that scholars used as source binder materials to produce GPC composites. The ranges of these binders were between 300 to 500 kg/m 3 , with the average and standard deviations of 417 kg/m 3 and 51.8 kg/m 3 , correspondingly. At the same time, other statistical assessment tools like variance, skewness, and kurtosis were 2689, 0.11, and -0.81, respectively, for the collected datasets. Figure 3 illustrates the CS and b content variation and frequencies of the gathered data of GPC mixtures incorporated nS.

Coarse aggregate content (CD)
Natural, crushed, and recycled aggregates are those forms of aggregates that were used as the CA in geopolymer concrete mixtures, just like conventional concrete mixtures. Same as FA, the CA should have all the properties which are required by ASTM standards. Regarding the ranges of CA, it was concluded that the contents of CA in past research varied between 810 to 1470 kg/m 3 with an average of 1113.8 kg/m 3 and standard deviations of 183.2 kg/m 3 . On the other hand, the variance, skewness, and kurtosis were 33580, -0.19, -0.71, respectively. Also, the correlations between the CS of tested datasets and the CA contents can be found in gure 5.

NaOH content (SH)
Pellets and akes are two forms of SH in a solid state with a purity above 97%. This material is mixed with the required amount of water to prepare a solution of SH with the required molarity. In this study, according to the collected datasets, the amount of SH in a 1m 3 of GPC mixture incorporated nS was in the range between 18.1 to 159.7 kg/m 3 , with an average of 71.3 kg/m 3 and a standard deviation of 33.9 kg/m 3 . Extra information about other statistical assessment criteria and correlations between the CS and SH content can be found in gure 6.

Molarity (M)
In the eld of GPC science, the concentrations of sodium hydroxide inside water were called molarity. The authors of this study found that the molarity of SH in the collected papers was in the range between 4 to 16 M, with an average of 11.9 M and standard deviations of 3.3 M. Also, it was found that the variance of the reviewed datasets was 11.1, the skewness was -1.4, and kurtosis was 1.3. The variations between the CS and M with the frequency of their datasets of GPC incorporated nS are presented in gure 8. 4.8 Na 2 SiO 3 /NaOH (SS/SH) This parameter consists of a mixture of SS and SH with the required molarity. Usually, it is prepared about 24 hrs before mixing the GPC ingredients. According to the gathered datasets, this parameter was used in the range between 0.33 to 3, with an average of 2.05 and standard deviations of 0.76. Also, the other statistical criteria were found to be 0.59, -1.2, and 0.22 for the variance, skewness, and kurtosis, respectively. Moreover, correlations between the CS and the SS/SH are illustrated in gure 9, with the frequencies of their datasets. 4.9 Nano-silica content (nS) As mentioned earlier, nS was the most frequently NPs that scholars used to improve various properties of GPC composites. It was used as a binder replacement or just by the addition. Table 1 shows the different properties of nS and other NP types that were utilized in GPC composites. Regarding the values of this input model parameter, it was found that the range of nS was used to improve GPC composites in the range between 0 to 60 kg/m 3 , with an average of 11.6 kg/m 3 , and the standard deviations of 14.5 kg/m 3 . Similarly, other statistical criteria with the correlations between the CS and the nS content can be found in gure 10.

Curing temperatures (T)
Ambient, steam, and oven curing regimes were commonly used to cure GPC composites. One of the reasons behind using NPs in GPC composites is to take away from the oven and steam curing methods and go toward ambient curing methods. Based on the collected datasets, GPC specimens modi ed with nS were cured in the temperature ranges between 23 to 70°C, with an average of 42.05°C and the standard deviations of 17.4°C. Also, other statistical assessment tolls like variance, skewness, and kurtosis were 303.9, 0.11, and -1.92, respectively. The variations of the CS with the nS content and the frequencies of nS datasets are presented in gure 11.

Age of specimens (A)
To gain su cient early and late CS, the curing ages should be extended to promote the polymerization process, which strengthens geopolymers. Thus, based on the collected datasets, the cure time for GPC incorporated nS ranged from 0.5 to 180 days, with an average of 28 days and standard deviations of 31.8 days. Similarly, the published datasets' variance, skewness, and kurtosis were 1012.8, 2.36, and 6.96, respectively. The relationships between the CS and the specimen ages with the frequencies of collected data are shown in gure 12.

Compressive strength (CS)
An applied vertical load per unit area of the GPC specimens was known as normal stress or compressive strength. This property is one of the critical mechanical properties of GPC composites. As shown in Table 2, the range of the CS for the gathered datasets was in the range between 3.

Modeling
Based on the coe cients of the determinations (R 2 ) of the collected input model parameters, as shown in gure 2 to gure 12, there is no direct relationship between the CS and any individual input model parameters. Therefore, multiscale model techniques, including M5P, MLR, ANN, LR, and NLR are employed to develop empirical models to forecast the CS of GPC composites incorporated nS in different mix proportion parameters, curing regimes, and specimens ages.
For creating the models, the collected datasets are split into three categories. The models were built using the larger group, which included 135 datasets. The second group is made up of 36 datasets that were utilized to test the created models, and the nal group is made up of 36 datasets that were consumed to Where, CS, x1, aandb represents the compressive strength, one of the variable input parameters, and models parameters, respectively. This equation contains just one variable of input data, so to have more practical and reliable investigations, equation (2) is suggested, which contains a wide range of input variable data parameters that can cover all of the geopolymer concrete mixture proportions and curing conditions, as well as curing ages.
As mentioned earlier, all these main variables in equation (2) were described except that the a, b, c, d, e, f, g, h, i, j, k, and l are the model parameters. Equation (2) is a one-of-a-kind equation because it incorporates a large number of independent variables to generate GPC-incorporated nS that may be extremely useful in the construction industry. On the other hand, because all variables can be adjusted linearly, the proposed equation (2) can be considered an extension of equation (1).

b. Nonlinear regression model (NLR)
In terms of the NLR, equation (3) may be regarded as a general form for proposing an NLR model (Mohammed et al., 2020). The interrelationships between the variables in equations (1) and (2) can be used to calculate the CS of normal geopolymer concrete mixtures and geopolymer concrete mixtures modi ed with nS using equation (3).
Where: all of the variables in this equation were provided earlier, except that the a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, and v are described as a model parameter.
c. Multi-logistic regression model (MLR) As with the previous models, the collected datasets were subjected to multi-logistic regression analysis, and the general form of the MLR is shown in equation (4) Where: all of the variables in this equation were provided earlier. Moreover, in this equation, the value of nS should be greater than 0.

d. Arti cial neural network (ANN)
ANN is a powerful simulation software designed for data analysis and computation that processes and analyzes data similarly to a human brain. This machine learning tool is widely used in construction engineering to forecast the future behavior of a variety of numerical problems (Mohammed, 2018;Sihag et al., 2018).
An ANN model is generally divided into three main layers: input, hidden, and output. Depending on the proposed problem, each input and output layer can be one or more layers. On the other hand, the hidden layer is usually ranged for two or more layers. Although the input and output layers are generally determined by the collected data and the purpose of the designed model, the hidden layer is determined by the rated weight, transfer function, and bias of each layer to other layers. A multi-layer feed-forward network is constructed using a combination of proportions, weight/bias, and several parameters as inputs, including (l/b, b, FA, CA,...), and the output ANN is compressive strength.
There is no standardized method for designing network architecture. As a result, the number of hidden layers and neurons is determined through a trial and error procedure. One of the primary goals of the network's training process is to determine the optimal number of iterations (epochs) that provide the lowest MAE, RMSE, and best R 2 -value close to one. The effect of several epochs on lowering the MAE and RMSE has been studied. For the purpose of training the designed ANN, the collected data set (a total of 207 data) was divided into three parts. Approximately 70% of the collected data was used as training data to train the network. The data set was tested with 15% of the total data, and the remaining data were used to validate the trained network (Demircan et al., 2011). The designed ANN was trained and tested for various hidden layers to determine optimal network structure based on the tness of the predicted CS of GPC incorporated nS with the CS of the actual collected data. It was observed that the ANN structure with two hidden layers, 24 neurons, and a hyperbolic tangent transfer function was a best-trained network that provides a maximum R 2 and minimum both MAE and RMSE (shown in Table 3). As a part of this work, an ANN model has been used to estimate the future value of the CS of GPC incorporated nS. The general equation of the ANN model is shown in equations (5), (6), and (7). division criteria for the M5-tree model is obtained through the error calculation at each node. The standard deviation of the class entering that node at each node is used to analyze errors. At each node, the attribute that maximizes the reduction of estimated error is used to evaluate any task performed by that node. As a result of this division in the M5P tree, a large tree-like structure will be generated, which will result in over tting. The enormous tree is trimmed in the   (13) Where: x p and y p are estimated and tested CS values, y′ and x′ are averages of experimentally tested and the estimated values from the models, respectively. tr, tst, and val are referred to the training, testing, and validating datasets, respectively, and n is the number of datasets. Except for the R 2 value, zero is the optimal value for all other evaluation parameters. However, one is the highest bene t for R 2 . When it comes to the SI parameter, a model has bad performance when it

d) ANN model
In this study, the authors tried a lot to get the high e ciency of the ANN by applying different numbers of the hidden layer, neurons, momentum, learning rate, and iteration, as can be seen in Table 3. Lastly, it was observed that when the ANN has two hidden layers, 24 neurons ( Figure 23 shows the tree-shaped branch correlations. Also, the model (in equation (17)) parameters are summarized in Table 4, and the model variables will be selected based on the linear tree registration function.   In addition, gure 27 shows a comparison of model predictions of the CS of GPC mixtures incorporated nS based on the testing datasets. Furthermore, gures 14, 16, 18, 21, and 24 display the residual errors for the CS by consuming all the datasets. The whole gures show that the estimated and tested CS values for the ANN model are close, indicating that the ANN model is more accurate than other models. Figure 25 shows the OBJ values for all of the proposed models. The OBJ is 8.05, 5.8, 8.8, 3.59, and 6.0 for LR, NLR, MLR, ANN, and M5P, respectively. The ANN model has a lower OBJ value, about 124% less than the LR model, 61.5% less than the NLR model, 145% less than the MLR model, and 67% less than the M5P model. This also emphasized that the ANN model better forecasts the CS of GPC incorporated nS.
In addition, gure 26 shows the SI values for the created models during the training, validating, and testing phases. CNTs have diameter of 20 to 120 nm and are several micrometers in length.
NPs=nano-particles; F= y ash; GGBFS= ground granulated blast furance slag; MK= metakaolin; SF= silica-fume; RHA= rice husk ash; NP=natural pozzolan; OPC=ordinary Portland cement; nS= nano-SiO2; nT=nano-TiO2; nM= nano-metakaolin; nC=nano-clay; CNT=carbon nanotube; nA=nano-Al2O3  Table 4 M5P-tree model parameters (equation (17)).  Figure 1 The ow chart diagram process followed in this study Correlations between CS and l/b ratio with histogram of GPC mixtures incorporated nS  Correlations between CS and SS content with histogram of GPC mixtures incorporated nS Correlations between CS and molarity of SH with histogram of GPC mixtures incorporated nS Figure 9 Correlations between CS and SS/SH ratio with histogram of GPC mixtures incorporated nS Figure 10 Correlations between CS and nS content with histogram of GPC mixtures incorporated nS Figure 11 Correlations between CS and T with histogram of GPC mixtures incorporated nS Figure 12 Correlations between CS and A with histogram of GPC mixtures incorporated nS Figure 13 Comparison between tested and predicted CS of GPC mixtures incorporated nS using LR model, (a) training data, (b) testing data, (c) validating data Figure 14 Residual error diagram of CS of GPC mixtures incorporated nS using training, testing and validating dataset for LR model Figure 15 Comparison between tested and predicted CS of GPC mixtures incorporated nS using NLR model, (a) training data, (b) testing data, (c) validating data Figure 16 Residual error diagram of CS of GPC mixtures incorporated nS using training, testing and validating dataset for NLR model Figure 17 Comparison between tested and predicted CS of GPC mixtures incorporated nS using MLR model, (a) training data, (b) testing data, (c) validating data Figure 18 Residual error diagram of CS of GPC mixtures incorporated nS using training, testing and validating dataset for MLR model Figure 19 Optimal network structures of the ANN model Comparison between tested and predicted CS of GPC mixtures incorporated nS using ANN model, (a) training data, (b) testing data, (c) validating data

Figure 21
Residual error diagram of CS of GPC mixtures incorporated nS using training, testing and validating dataset for ANN model Figure 22 Comparison between tested and predicted CS of GPC mixtures incorporated nS using M5P-tree model, (a) training data, (b) testing data, (c) validating data Figure 23 M5P-tree Pruned model tree Figure 24 Residual error diagram of CS of GPC mixtures incorporated nS using training, testing and validating dataset for M5P model Figure 25 The OBJ values of all developed models Figure 26 Comparing the SI performance parameter of different developed models Compression between model predictions of CS of GPC mixtures incorporated nS using testing datasets