Radon prediction using metrological parameters, and its signi cance in Seismo-Ionospheric coupling

In order to derive boundary conditions, atmospheric and seismo-ionospheric models are coupled with generic assumptions. These boundary conditions are used to simulate possible interaction between earth’s atmospheric, geological processes and the Ionosphere. Rn is one of the major contributors to surface ionization, which is also believed to contribute to the geophysical and geochemical processes causing disturbance in the Ionosphere. Its relationship with metrological parameters in a given region gives an insight on the natural processes in the underground, as well as the relationship with atmospheric processes. It can enable models to establish realistic results rather than ideal theoretical consequences. Rn can be influenced by a number of physical variables, and therefore, this influence is sometimes very complicated to study, because of the Forcing researchers to make ideal assumptions. The relationship between Rn and some geological variables are studied, namely; soil temperature at 5 cm, 10 cm, 20 cm, and 50 cm, atmospheric pressure, and atmospheric temperature. A hybrid model is established based on the artificial neural network (ANN), which is referred to as multiANN model. This model is a combination of multiregression and ANN models. Itenables Rn prediction to metrological parameters. To test the robustness of our model 50% training periods is employed with 50% testing periods. The model is able to forecast the remaining 50% effectively. With the aid of the Monte-Carlo method, it is possible to predict multiple future Rn variations with high precision. The regions with low performance of the multiANN are identified for possibly relationship to seismic events. The model could be a good candidate for predicting of Rn concentration from metrological parameters, especially in establishing the lower boundary conditions in seismo-ionospheric coupling models.


Introduction
Significant progress has been recently made in evaluating the relationships between the earthquake-ionosphere (Namgaladze et al., 2018;Sorokin et al., 2020). During, before, and after the earthquake, physical variables such as friction, pressure, and physical changes in the ground layers occur (Pulinets et al. 2006;Gousheva et al. 2009;Sarlis et al. 2015;Tariq et al. 2019). The emergence of gases such as 222 Rn, He, and CO2 in the ground layers causes some chemical changes (He and CO2), which prevails up to the ionosphere. Models have been established to mimic and simulate Lithosphere Atmosphere Ionosphere Coupling (LAIC) processes or the Seismo-Ionospheric processes (Grimalsky et al. 2003;Sorokin et al. 2007;Denisenko et al. 2008;Xu et al. 2015;Zhou et al. 2017). Some of these models often depend on ideal assumptions for some given parameter values. For example, Denisenko et al. (2008) had to assume the value 100 V/m for the surface electric field generated in a seismic region. The model would have taken into account many other atmospheric variables and geological variables, which influence the surface electric field.
Many similar examples exist in cited references. One challenge is that, most geologic and atmospheric variables are related intricately, hence forcing researchers to make ideal assumptions to simplify their studies. In order to be able to interpret the physical and chemical changes in the atmosphere during seismic activity in a meaningful way, the atmospheric and meteorological changes should be examined simultaneously. If there are too many variables in a study, it is necessary to establish a suitable model to examine their relationships and changes in real-time.
The proposed model is also effective and important for the generalization of calculations .
Soil Rn gas concentrations are heavily affected by soil temperature, atmospheric temperature and pressure, tidal events, and earthquakes (Kumar et al. 2009;Laiolo et al. 2012;Sahoo et al. 2018). The relationship between Rn, temperature, and pressure depends on many different factors, some of which include geology and geochemistry of the interested monitoring site, and the relationship is not certain always. For example, Almayahi et al. (2013) and the references indicate that both positive and negative weak correlations exist between Rn and metrological parameters. Külahcı and Şen (2014) have discussed more of these effects on Rn, especially in relation to seismicity. An increase in temperature increases the kinetic energy of Rn atoms, and this causes Rn to move. Rn can be carried highly by appropriate atmospheric pressure, wind, and other meteorological effects. As altitude increases, the rate of transport of Rn gas may increase due to winds and other convective processes. It has been determined that the soil Rn gas emerges from a depth of 200 m. Rn gas coming out of the earth's crust causes ionization in the air (Pulinets 2004). The geological structure of the place is an important factor in the spread of Rn to the environment before or after earthquakes. Rn concentration dynamics can give a significant anomaly between 20 days and 3-4 months before the earthquake (Igarashi et al. 1995;Külahcı and Çiçek 2014). Rn propagation mechanisms are extremely important in earthquake prediction studies (Molchanov et al. 1998;Garcìa et al. 2000;Cipollini et al. 2001;Pizzino et al. 2004). This importance has increased recently and the seismic activity-Rn-atmospheric dynamics trio has become more important. Seismicity-Rn (Trique et al. 1999;Cicerone, Ebel, and Britton 2009;Jordan et al. 2011;Parks et al. 2013;Novakova et al. 2016;Soldati et al. 2020;D'Alessandro et al. 2020) and/or seismicity-atmosphere (Schekotov et al. 2007;Smirnov 2008), or better known as seismo-ionospheric coupling studies (Sytinskij et al. 1997;Korepanov et al. 2009;Tsolis and Xenos 2009;Parrot et al. 2016) can be considered as the next step of the seismo-gas-ionospheric trilogy. Tthis trio will be used very effectively in future studies, but it is very difficult to generate or find the necessary scientific data in such studies. The definition of "gas" in this trio would be 222 Rn due to its short but measurably long half-life of 3.8 days and the relatively inexpensive cost and convenience in its measurements. In seismo-ionospheric coupling, data transfers, data distribution, and propagation that form physical/chemical transitions between earth and atmosphere must be well established. These or other physical/chemical concepts or variables will be included in this concept. It paves the way for a very powerful set up of seismo-ionospheric coupling modeling.
The ionizing properties of Rn gas highlighted the concept of "atmospheric plasma" (Pulinets and Boyarchuk 2005). In practical applications, Rn is the first link of physical processes that provide seismo-ionospheric coupling, and it is well-known that Rn is a radioactive inert gas.
Furthermore, gases such as CO2 spread from the earth's crust to the atmosphere, and they also carry aerosols to the earth. Meanwhile, metals such as Hg, As, and Sb are also transported into the atmosphere close to the ground together with aerosols. These metals easily change the electrical properties of the atmosphere close to the ground. On the other hand, such as temperature effects and wind can easily transport ionizing gases to higher altitudes. The area of the region, where these metals are transported to the earth is as much as the earthquake preparation area and parallel to this situation, the vertical electric field above this region also changes (Pulinets and Boyarchuk, 2005). This electric field is expected to change before earthquakes. The main source of alpha particles in air is the 222 Rn isotope (half-life 3.8 days). With the decay of 222 Rn, the energy of Eα = 6 MeV emerges and this energy generates ~ 2·10 5 electron-ion pairs in the air (Fleischer 1982). Solar activities also contribute positively to the ion production rate (Kamble et al., 2013). All artificial and natural radioactive sources contribute to ion production in the troposphere. 222 Rn is known to make changes in the electric field in the atmosphere close to the ground (Holzworth 1987;Smirnov, 2008). Its ionization power in the atmosphere is 10 times higher than cosmic rays (Martell 1985). Roffman (1972) determined that Rn is a natural ionizer in near-ground air and concentrates on brittle rock formations due to its large atomic mass. Prior to the earthquakes, Rn flow accelerates in geological regions and Rn density in the atmosphere increases.
In this study, a way is suggested for enhancing seismo-ionospheric coupling studies by involving as many atmospheric variables as possible in the seismo-ionospheric studies. So that the study can now be modified as seismo-gases-ionospheric trio coupling. A model is proposed that employs both multi regression and artificial neural network (ANN) and is powered by Monte-Carlo simulation to express valid views on this trilogy. The model could be significant in relating underground processes to surface processes, This is why metrological parameters are used in soil and in the atmosphere. Moreover, since near-surface atmospheric electrodynamic processes (ions, charged aerosols, Global electric circuit fair-weather current etc.) have a direct relationship with Rn and metrological parameters, models like this can help to better understanding of these processes.

Earthquake data
In this study, Erzincan City data are employed, which lies along the North Anatolian Fault (NAF). The seismicity of this fault is discussed in (Toksöz et al., 1979;Nuriel et al. 2019). Erzincan Rn monitoring station (see Fig. 1) is located at latitude-longitude coordinates (39.7337, 39.61722).
The earthquakes occurrences around the Erzincan region are presented in Table 1. Earthquakes are denoted with red dots while Rn monitoring station is represented with a green triangle. All earthquake data are collected from Kandili Observatory and Earthquake-Tsunami monitoring center, Bogazici University (http://www.koeri.boun.edu.tr/sismo/2/earthquake-catalog/). According to Rn anomaly criteria, the relationship between the earthquake preparation radius ( is earthquake magnitude) and the distance from the Rn monitoring station to the epicenter should satisfy Eq. (1). Both and are in kilometers, is estimated in a similar way as that in Alam et al. (2020).
(1) Meteorological Service: https://www.mgm.gov.tr/eng/forecast-cities.aspx). The metrological data are namely; soil temperature at various depths (5 cm, 10 cm, 20 cm, and 50 cm), atmospheric temperature and atmospheric pressure data. A plot of all data is presented in Fig. 2(a). Rn (blue curve) seems to visually imply good relationship with temperature, which has also been reported by some studies (Kar et al. 2019;Muhammad et al., 2020). The relationship with atmospheric pressure does not look good visually. A better representation of the data relationship can be seen when one considers the Pearson correlation matrix in Fig. 2(b). It is not surprising to see atmospheric pressure reflecting a poor relationship with both Rn and temperature (dark region) as in Fig. 2(a).

Multi-Regression model (multi-reg)
Since all soil temperature data are highly correlated, it is chosen to minimize the number of predictors by considering only one depth from the soil temperature data (5 cm depth). The distribution density and correlogram plotsis presented in Figs. 3 (a) and 3 (b), which are an improved good visual presentation of Rn relationship with the chosen predictors. Fig. 3(a), represents Rn distribution, and when compared with temperature it gives a relative linear scatter relationship, which is far better than to comparison as to how it varies with atmospheric pressure.
The correlogram in Fig. 3 can still seek to include atmospheric pressure in the proposed multi regression model. This would be important, as it will favor seismo-ionospheric models to capture the influence of metrological parameters and obtain realistic predictions. A reasonable correlation is implied with atmospheric and soil temperature, especially at lags 20 and 14, respectively, where the maximum correlation occurs as . Fig. 3(b) again shows a good relationship between soil temperature and atmospheric temperature, where the two curves are almost aligned with each other. This means that the model can enable the exploration of ground and atmospheric physical processes.
Both Rn and metrological data have a length of 1083 points, where 50% is used for training (542 points) and 50% for testing (541), which is different from normal modelling practice. The goal is to be able to achieve a long term Rn prediction using these metrological parameters. The multi regression model proposed for Rn prediction is given by; ( 2) where are regression coefficients estimated to minimize the residual sum of squares between real and the Rn data obtained by the linear approximation in Eq.
is the atmospheric temperature ( o C), is the atmospheric (hPa), is the 5 cm soil temperature, and is the residual for the multi-regression model.

multiANN
It is the main purpose of thişs study to improve the prediction quality by modelling the residual obtained from the multi regression model. For this purpose, ANN application is chosen rather than the multi-regression residuals. This requires that Eq.
(2) written as, Here is the ANN model for multi-regression residual, and is the general error of the hybrid model. In modelling the multi-regression residual, ANN multi perceptron model is adapted and it consists of input layer, two hidden layers and the output layer, which gives the predicted If is a set of residual data from multi-regression model, where is the length of residual data array, then one can represent in terms of some index count such that; (4) It is a tradition to split data into window slides in order to be able to implement the ANN model.
It is, herefore, constructed as the ANN model input and target arrays as; and (5) is the window size, is the input and target index, and are respectively the elements of both input ant target arrays . and are defined in terms of some index , such that; , , , , are further divided into training and validation phases, which are fed into the feed forward ANN architecture in Fig. 4. For data, , and . 70% of the data is used for training and 30% for validation. The network has an input layer, two hidden layers and an output layer.
The first layer is the input layer, which consist of four inputs, the second layer is a dense layer consisting of 99 neurons, the third layer is also a dense layer with 19 neurons, and finally the output layer follows. After 99 epochs of training, a reasonable prediction is achieved. Adam optimizer is chosen to improve the learning process, and the means squared error (MSE) is adapted to determine the loss for each EPOCH (Sebastian and Vahid 2017). More information on ANN and Multi regression can be found in (Özcan et al. 2020;Pérez et al. 2014).

Monte Carlo
The aim is to derive a model that would enable a long term forecasting of Rn variation. In

Rn anomaly detection
Machine learning models can be a good tool for anomaly detection. For example, Akhoondzadeh (2013) used support vector machines to detect Ionospheric anomalies. Herein, the hybrid multiANN model is employed to detect Rn anomalies related to seismic activities, and the relative change using at aset of thresholds. Any multiANN prediction error which exceeds 70% would be regarded as a possible anomaly. This is necessary to detect the anomalies, which occur during training as well as during the model forecast periods.

Multi-regression results
The multi-reg Eq.
(2) with estimated parameters is obtained as; and the modified multiANN model from Eq. (3) becomes as; Here, is the model for multi regression residual using ANN. is the general error after combining the Multi-reg+ANN to Eq. (7). Rn prediction in Eq. (7) for both training and testing periods is presented in Fig. 5. The model performed fairly during the training period, and poorly during the forecast. The poor performance can be obviously justified, since the atmospheric temperature data did not correlate well with Rn. The computed R-squared value for Eq. (7) indicates that the multi-reg model could explain 60% only of Rn variation, and the RMSE gives an overall error of 14Bqm -3 . What can be seen from Fig. 5 is that the multi-reg model is able to capture mostly the global Rn variations, but not the local one. Most Seismo-Ionospheric models would require a model, which would account for both local and global Rn variations. This would enable robust forecasting/simulation as well as better detection of seismically induced Rn anomalies.

multiANN results
As mentioned, Eq. (8) is an improved version of Eq. (7). The residual from Eq. (7) for the training period is presented in Fig. 6 (a). The training and validation loss results for the model are presented in Fig. 6 (b). As seen, reasonable outcomes are achieved after 99 EPOCHS of training on the data. The MSE for the training is relatively low and stable after a few Epochs, and even lower values for the MSE are achieved for the validation. With these results from the model, a good prediction of the future values are achieved.
The red curve in Fig. 7 (a) is an outcome of Eq (8). As seen, there has been a visually significant improvement on the multi-reg model (yellow curve) for both the training and testing periods. When one looks closer at the forecast section in Fig. 7 (b), it is possible to see that the multiANN model visually demonstrates strength in Rn forecasting. The RMSE and R-squared values have respectively decrease and increase by about 69% and 54% (see Table 2). This achievement is made despite the existing poor relationship between Rn and atmospheric pressure.
Hence, an approach similar to this gives a green light, especially to researchers willing to account for the variability of uncorrelated/poorly related parameters into a given atmospheric/Seismo-Ionospheric model.

Monte-Carlo results
Since the Monte-Carlo is simulated within bounds of the multiANN prediction curve, there is the need to ensure that the error is normally distributed. Comparing multiANN forecast residual with its theoretical quantile Fig. 8 (a), one can see that the residual is almost normally distributed with a few deviations from its left tail. The autocorrelation plot in Fig. 8 (b), shows that most of the residuals are uncorrelated except for the first four lags, which implie a sense of randomness in the data. To further support this, the two tail test is conducted, and the results suggest normality.
The rolling Monte-Carlo forecasts for 99 paths of the multiANN model are presented in Fig. 9 (a).
As expected, all the forecast simulation curves have shadowed the real Rn (blue curve) data itself.
The mean path (black curve) for all 99 simulations is also shadowed by these curves. According to Fig. 9 (b), the performance of Monte-Carlo simulation increases with an increase in number of simulations. The Pearson correlation between the Monte-Carlo mean forecast and Rn data increases from zero to twenty simulations, and then stabilizes. The same goes with RMSE for mean Monte-Carlo prediction, it decreases with wiggles, and then stabilizes after about 80 simulations. The R-squared values for each simulation mean is presented in Fig. 9 (c). The behavior is obviously similar to that of the Pearson correlation, and the collective mean of all 99 simulation paths is able to explain about 89% of Rn variation (see Table 2).
From Table 2 it is quite obvious that both multiANN and Monte-Carlo simulations outperform the multi-reg model in explaining Rn variation from metrological data. This suggest the significance of the modelling approach in atmospheric and seismo-Ionospheric studies. A limitation of this approach is that one may not be able to define Eq. (8) using pure mathematical expressions. There is the need to use as an object model, which is implemented only by using a designed program. Another limitation is with the rolling Monte Carlo mean estimations. When implemented, the model needs to sequentially compute mean of means, and this is expensive especially when a low processing power is employed. Starting with Fig. 10 (a), there are two regions ehere anomalies are detected. The first anomaly between April 2007 and June 2007 is preceded by the two Bingol earthquakes (4.7 and 4.8), which occurred respectively on Fri-9-Mar-2007 23:24 andThu-8-Mar-2007 12:35. There is no Rn data prior to these sister earthquakes, which would have been helpful in explaining the observed anomaly. Moreover, the anomaly is post seismic, it is most pronounced a month later and with abrupt spikes. In an attempt to relate this to seismicity, one would see it as a consequence of crustal recovery from alleged deformation caused by the two Bingol sister Earthquakes.

multiANN and Seismic anomalies
The second anomaly region in Fig. 10 (a) lies between January and July 2008. The two Erzincan earthquakes with magnitudes 4.1 and 4.0 (see Table 1), which occurred respectively in December 2007 and January 2008 are the existing earthquakes, when the anomaly spiked to its highest peak. These two earthquakes could be attributed to this anomaly since they satisfy Eq. (1).
Moreover, the anomaly is very pronounced and lingered near the time of occurrence. Since studies like (Miklavčić et al. 2008;Ghosh et al. 2007;Walia et al. 2003;Sahoo et al. 2018) have reported to Rn anomalies related to earthquakes of similar magnitudes, there exist a possibility that the observed anomaly is seismically related.
The observed three abrupt spikes, which occurred between March and June 2008 Fig. 10 (a), could be best explained as a possible consequence of the following points.
(a) The recovery from stress, which accumulates during the preceding 4.1 and 4.0 Erzincan earthquakes.
(b) A cluster of low magnitude earthquakes, which were ignored by the study, or some underground processes that did not involve noticeable deformation, but involves sudden ejection of Rn.
(c) Spurious anomaly detected by the multiANN model.
Anomalies related to earthquakes having similar magnitudes with the ones in Fig. 10 (b) have been reported by different studies (Walia et al. 2003;Külahcı and Şen 2014;Külahcı and Şen 2015), and therefore, the suggested scenario (d) could be logical. The last earthquake (5.0) has the highest magnitude in Table 1, and it occurred on Thu-30-Jul-2009 7:37. It is obvious from Fig. 10 (b) that Rn anomalies have existed in both prior and post times of its occurrence. The evidence that would support the observed prior and post anomalies is by considering the sudden spike, which occurred during the event and the second more pronounced spike in the vicinity of occurrence time. Another reason to relate this earthquake with observed anomaly is due to it occurrence at a nearest location to the Rn monitoring site (see Fig. 1). This means, the pre seismic anomalies are a consequence of the 5.0 Erzincan earthquake preparation, and the post seismic are a consequence of recovery from deformation, or other geological underground processes. To validate the occurrence of this anomalies, other anomaly detection methods are applied like the ones in (https://adtk.readthedocs.io/en/stable/examples.html). The major detected anomalies are discussed in other methods that are similar to the ones in this study. Other methods like the Inter-Quartile and moving average methods are not effective when used to filter anomalies the data. After all one can state that the multiANN model can also be helpful in seismic studies.

Conclusions
A hybrid model is presented that can enable the long term prediction of Rn variations using metrological parameters. The model is a combination of multi-regression and ANN models. This is why it is referred to as multiANN model. The model is able to predict Rn two years ahead of its training periods, and with a reasonable precision. Monte-Carlo forecasting is implemented using the multiANN model, and the future possible Rn variations are predicted not far from the model curve. The model can be significant in the following ways: (1) The model could help in relating underground processes to surface processes (this is why we used metrological parameters in soil and in the atmosphere), especially during seismic periods, or when other perturbations occur.
(2) Since near surface atmospheric electrodynamic processes (ions, charged aerosols, Global electric circuit fair weather current etc.) can have direct relationship with Rn and metrological parameters. A model like this can help in better understanding of these processes. Moreover, the model could enable Seismo-Ionospheric models to take into account many other variables. 15 (3) The study emphasizes the power of applying ANN to model the residuals from multi variate regression models. Even if there is a very poor relationship between them, there will still be a good chance of prediction of a variable of interest from other related parameters (4) The model could be used also in anomaly detection, especially in seismo-Ionospheric studies where anomaly detection is very important.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments
We would like to thank Boğaziçi University (http://www.koeri.boun.edu.tr/sismo/2/earthquake-  Study region and earthquakes which occurred, Rn monitoring station is located in Erzincan Figure 2 (a) All the metrological parameters as well as Rn during the study time, (b) Correlation between Rn, atmospheric temperature, atmospheric pressure and soil temperature Multi-regression prediction curve (the red curve is Rn test data)   GraphicalAbstract.docx