A mobile simulation and ARIMA modeling for prediction of air radiation dose rates

A spatial simulation method in.mp4 format was proposed to determine Fukushima radioactive fallout transport and the Absorbed Dose Rate, Annual Effective Dose Equivalent, and Excess Lifetime Cancer Risk were determined for 10 months after the accident (March 11 2011). The findings of this study demonstrate that an appropriate ARIMA model can be applied for radiation dose time-series in the case of nuclear reactor accidents like Chernobyl and Fukushima to predict the future air dose rates, which can provide valuable information in determining the evacuation zones, decontamination processes, and radiation protection progresses. The model forecasted results and the actual observation data in the same period shows a gradual decrease in the air dose rates during the prediction period. Moreover, there is a good agreement between them as the prediction and observation scatter plot follows each other with small variations. These results provide important insights into the predictability of ARIMA models; thus, the models were utilized to forecast the air dose rates for the period (January 2020–October 2020).


Introduction
Fukushima Daiichi Nuclear Power Plant (FDNPP) with approximately 3.5 km 2 area is located at 37°25′23″N and 141°01′59″E in Futaba, Fukushima Prefecture-Japan [1], Tokyo Electric Power Company (TEPCO) constructed and operated the power plant, there were six Boiling Water Reactors (BWR) at the site. The Units 1-5 were built as Mark I type (light bulb torus) containment structures. Unit 6 is a Mark II type (over/under) containment structure [2][3][4]. On 11 March 2011 at 2:46 pm Japanese time one of the most powerful earthquake (Great East Japan Earthquake) around the world since 1900 with M W 9 hit the eastern coast of Honshu Island exactly at 130 km offshore of the Sendai city. The epicenter of the earthquake in the sea at around 200 km, and the earthquake resulted in huge tsunami waves with heights up to 40 m in Miyako-Iwate Prefecture [5][6][7]. Besides, the massive earthquake and its resultant gigantic tsunami caused to 15,891 deaths and 2579 missing people [8]. Furthermore, the tsunami resulted in a level 7 nuclear accident according to International Nuclear Events Scale, the accident led to release a huge amount of radioactive materials into the environment [9].
The accident released an enormous quantity of radionuclides from exploded reactors into the atmosphere, Pacific Ocean and ground [10,11]. The concentration of released 137 Cs and 134 Cs similarly reached the peak in the mid of March 2011 with the estimated amount as 15.2-20.4 PBq Namie Town in Fukushima Prefecture, Japan [12]. The distribution of gamma-emitting radionuclides resulted different air dose rates around the FDNPP regions [13]. Therefore, on March 11, 2011, an evacuation zone within a 3 km radius from FDNPP and indoor zone within a 10 km radius were ordered by the Japanese government. One day later March 12, 2011, the evacuation zone was widened to include areas within a 10 km radius, on the same day, the evacuation zone was expanded to areas within a 20 km radius [14]. In the early stage of the accident, soil samples were collected from 15 to 30 March 2011 at 15 various positions around the FDNPP, in which many fission products were detected including 129m Te, 129 Te, 131 I, 132 Te, 134 Cs, 136 Cs, 137 Cs, 140 Ba, and 140 La [15]. Another crucial radiation monitoring step was the mapping project in between June 2011 and December 2012. The project showed that the radionuclides particularly radiocesium were densely deposited into the northwest of the FDNPP site, because of the plume released at 12 to 15 JST (Japanese Standard Time) on 15 March 2011 flowed northwestward, and wet deposition occurred with precipitation in the nighttime of the same day [16]. In contrast, the ratios of 131 I and 129m Te to 137 Cs was higher in the regions of the south of the FDNPP.
After the accident in Fukushima prefecture and neighboring areas the ambient air dose rates have been continuously measured by various methods through car and air-borne surveys, and in-situ techniques [17]. Long term measurements can assure the public, monitor hazard levels reduction, and give insights about future preparations like decontamination [18]. Fortunately, the observations show that the amount of air doses has been reduced significantly in residential areas compared to the forests and remote areas, owing to the radioactive decay of radiocesium, decontamination processes, the activity of the inhabitants, and the penetration into the deeper depths of the soil [19,20]. Nevertheless, this situation can be distinct for residential houses, as penetration and deposition are probable due to radionuclides through vents, doors and windows [21,22].
In some cases, the ambient air dose rate measurements using the aforementioned methods are not easily carried out for example, when the roads are closed and the surface of the earth is covered with vegetation. For these reasons, researchers have taken advantage of the use of spatial-temporal simulation models to study the effects of radioactive fallout and isotope distributions [23,24]. Moreover, the autoregressive integrated moving average (ARIMA) model has already been used in the field of nuclear physics for forecasting the 226 Ra, 232 Th and 40 K concentrations in four regions of Istanbul [25], nuclear fuel cycle price estimation [26], and radon gas concentrations time-series estimation for earthquake prediction purposes [27,28]. It seems that, this model has not been used by scholars for estimating the Fukushima air dose rates. There are many types of time series analysis such as stochastic, neural networks, and SVM approaches. We preferred for the ARIMA analysis in this study because the use of this method is new for quantities in this study. This study can make a major contribution to the applicability of ARIMA model in the case of nuclear accidents.
The main objective of the current study is to predict future air dose rates by using ARIMA model in a return zone, where the inhabitants entry is forbidden [29] and the annual cumulative dose is greater than 20 mSv [30]. Evaluating the radiological hazard indices relying on the output data. In addition, air radiation transport will be shown through the simulation animations based on this model. These simulations estimate and give a visual indication of the radioactivity change in 10 months after the accident.

Research area
There are eight air dose rate monitoring points (MP1-MP8) around the FDNPP. Detectors were positioned at a height of about (1 m) from the ground surface because in the case of an adult majority of sensitive human organs are located at this height [31].
This study selected time-series data of air dose rates for four monitoring points (MP1, MP2, MP4, and MP5) [32,33], which are located in between the north to west directions of the FDNNP site as shown in Fig. 1. As, it has been observed that the majority of radionuclides detected in these directions during the accident due to the weather conditions [34]. The details of each monitoring point were obtained from the International Atomic Energy Agency [35]. TEPCO has been running measurements, in normal working conditions each monitoring point measures the air dose rate in every 10 min time intervals, 144 measurements in a day, about 4,320 measurements in a month and a total 51,840 measurements in a year.
For this study, the daily average data were retrieved from (https:// www. tepco. co. jp/ en/ nu/ fukus hima-np/ f1/ index-e. html) for the first day of each month as a training dataset (e.g. 1st January 2013, 1st February 2013 to 1st December 2018), totally 72 months of data were prepared. Figure 2 presents the graphical summary of the train dataset of air dose rates. The data from 1 January 2019 to 1 October 2019 (there is no particular reason for such selection of data) are used as a test dataset to assess the model's predictability by calculating both root mean square (RMSE) and mean absolute errors (MAPE) between the test and the predicted values from the fitted models. Also, the radiological hazard indices for the training period as well as the test period were computed. Finally, the best identified ARIMA models were applied to predict the air dose rate of monitoring points from (1st January-1st October 2020). In this work, MATLAB® software program was used to select, estimate and forecast the air dose rates for all monitoring points.

The autoregressive integrated moving average (ARIMA) model
A time-series is a collection of x(t) observations, each reported at a specific time (t), the observations can be displayed as a function of (t) in a time-series plot [36,37]. Time-series data forecasting can be done using several models such as stochastic, neural networks, and SVM, and ARIMA can be said one of the most popular models [38,39]. The ARIMA model was popularized by Box and Jenkins in the 1970s [40] and the model can be applied when the observations are stationary [41,42]. In many fields such as health, economic, financial, engineering and environmental applications, and physics this model has been utilized. The general form of Autoregressive (AR), Moving average (MA) and ARIMA models are given by the equations below from [26,43]:

Autoregressive model
The past values affect the current time-series data as, where µ is a constant, α i (1 ≤ i ≤ q) is a parameter of the model, x t−i is a value that observed at (t−i) and t is the error at time t.
(1) where µ is a constant, βi (1 ≤ i ≤ q) is a parameter of the model, ε t−i is an error value that observed at (t−i) and t is the error at time t.
The non-stationary time-series can be converted to stationary by differencing the (X) time-series data as, Finally, the ARIMA model can be expressed as, In the case of ARIMA (p, d, q), p means the order of the autoregressive term (AR), d means the order of the differencing and q means the order of the moving average term (MA).
The fundamental steps for the ARIMA model implementation are shown in Fig. 3, [44], in which each information is described in the result section.
Finally, the selected model is applied for its relevant timeseries data to predict future cases. The reliability of each model was examined by finding the errors like below [45]: Here, n is the total number of predicted values, X (k) is actual data, PE (k) is the percentage error; and X (k) is the predicted data.

Radiological parameters calculation
Exposure levels and the hazard indices for the monitoring points are calculated by using the following equations [46][47][48][49]. We converted the units to suit the literature:

Absorbed dose rate (D)
Outdoor gamma dose rates (OGDR) in the air are measured in μSv h −1 unit. The data obtained for the air dose rate in μSv h −1 are converted into μR h −1 using the conversion factor. The external absorbed doses in nGyh −1 include both the cosmic rays and terrestrial component of the gamma radiation. The data obtained for the external exposure rate in R h −1 are also converted into absorbed dose rates nGy h −1 using convenient conversion factor The annual effective dose equivalent (AEDE) The computed absorbed dose rates are used to calculate the annual effective dose equivalent (AEDE) received by residents in the study area. The AEDEs are calculated using the following formula: In this equation, for the calculation of the AEDE, the dose conversion factor of 0.7 is recommended by UNSCEAR for the conversion coefficient from the absorbed dose in air to the effective dose received by adults and an occupancy factor of 0.2 for outdoor exposure.

Excess lifetime cancer risk (ELCR)
The excess lifetime cancer risk (ELCR) is estimated based on calculated values of AEDE through the following equation: where AEDE is the annual effective dose equivalent, DL is the duration of life (70 years) and RF (Sv −1 ) is the fatal cancer risk factor per Sievert, which is 0.05 Sv −1 according to the ICRP-60.

ARIMA model establishment
Original data's autocorrelation function (ACF) graphs of (MP1, MP2, MP4, and MP5) showed a very slow decay and this was considered as a sign of all data nonstationarity property. Moreover, the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test p-values displays that data was not stationary, because (p-value = 0.01) for all monitoring points. Then, the data were transformed into stationary by using the 1st order differencing. The differenced time-series ACF for all the monitoring points are shown in Fig. 4. Besides, the stationarity of the differenced data was examined by the Augmented-Dickey-Fuller (ADF) and KPSS tests, which confirmed the stationarity of the differenced data, because the ADF test p-values for all data sets were lower than (0.05), while for the KPSS test p-values for all monitoring points were higher than (0.05), thus both tests satisfy the stationarity criteria.
Following that, the ARIMA model build up phase started in order to identify the order of p and q terms. The merely possible way that has been in use by scholars to guess their orders depend on the partial autocorrelations function (PACF) and ACF graphs of the differenced time series. The p and q values are equal to the number of the lag where spikes are coming into the confidence level in PACF and ACF graphs, respectively [50][51][52][53][54]. However, it has been criticized by some researchers that this estimation method could not be sufficient at every time [55]. In spite of (ACF) and (PACF) suggestions, there are a number of information criteria that provide selection the best ARIMA model, namely Bayesian information criterion (BIC), and Akaike's Information Criteria (AIC). The best fit ARIMA model was selected with the minimum AIC and BIC values, and they are defined as follows [45]: where k is the number of estimated parameters, n is the number of recorded measurements, and l is the value of the likelihood.
This study attempted (144) various (AR), (I = 1), (MA) combinations for each monitoring point data to start from (0,1,0) to (11,1,11) models. As the initial step, the tentative models were selected relying on the lowest AIC and BIC values. If two models have a close AIC and BIC values, the model with the smallest values of RMSE and MAPE errors was chosen as the tentative model. After model identification and parameter estimations, the best model was selected as (MP1) data ARIMA (9,1,3) among all other (143) trained models, because this model passed all the criteria.
Another crucial phase in the ARIMA model building is the residual diagnostics which were obtained by using MAT-LAB® software. Literally, an ARIMA model is adequate when its residuals are normally distributed and random [56]. The standardized residual and residual histogram between the predicted and true values of the air dose rate of MP1 were calculated to test the model goodness. Figure 5a shows that most of the standardized residuals for air dose rate forecasts have ±0.2 values; thus, the standardized residual can be considered as normally distributed, which reveals the goodness of the proposed ARIMA models. The autocorrelation and partial autocorrelation residual plots of MP1 show that for the first 20 lags, all sample autocorrelations fall inside the confidence levels, indicating that the residuals appear to be random. The residuals randomness was confirmed by applying the Ljung-Box Q-Test [57]. In this case, the Ljung-Box Q-Test shows that the first 20 lag autocorrelations among the residuals are zero (p-value = 0.96), indicating that the residuals are random and that the model provides an adequate fit to the data [26,58]. The above set of residual diagnostics are tested on all other monitoring points and the same kind of results are achieved. Here only residual diagnostics graphs of MP1 are shown in Fig. 5. Eventually, these models in Table 1 were selected as the most suitable ARIMA models.

ARIMA model air dose rate forecasting
In the previous part of the study the best fit models were established, then they were applied to the air dose rate timeseries to estimate the future 10 months period (1st Janu-ary2019 to 1st October 2019). The results of the forecasting are illustrated graphically in Fig. 6, where each graph consists of the training, test and predicted data. The magnified part of each graph at the top-right corner shows the actual, forecasted, upper confidence and lower confidence levels.
The results indicate that the ARIMA model prediction dose rates follow the actual dose data by monitoring points with small variability and they fall within the confidence band.
The spatial distribution of the forecasted air dose rates was mapped by using the Kriging Interpolation Technique (KIT) [37]. KIT defined the semivariogram among the air dose rate monitoring points. KIT results also show that the air dose rates will remain higher around the MP4 as the radionuclides were mostly released in this direction due to the weather conditions after the accident. Afterward, the model's accuracy was tested again by calculating the MAPE and RMSE among the actual and the predicted values. The small error values show the accuracy of the model. For instance, the MAPE takes values 2.2%, 2.7%, 3.1% and 5.1%; and RMSE value are 0.029, 0.026, 0.049 and 0.067 for MP2, MP1, MP4 and MP5, respectively. We think that some discrepancies between actual and predicted data in MP2, MP4 and MP5 will be resolved by increasing the number of data (Fig. 6).
Eventually, the fitted models were applied for their relevant measured air dose rates time-series from (1st Janu-ary2013-1st December 2019) to predict the air dose rates for the period (1st January 2020-1st October 2020). The spatial distribution of the forecast air dose rates at 1 m height are illustrated by an animation simulation in mp. 4 format.

Model forecasted air dose rate hazard indices
The results of the absorbed dose rate in the monitoring points around FDNPP are varied from 662.1 nGy h −1 (MP1 points) to 1231 nGy h −1 (MP4 points) for four monitoring points ( Table 2). The training data period covers the period from 1st January 2013 to 1st December 2018.
In the monitoring points calculated average of the annual effective dose equivalent was found as 2845.5 μSvy −1 , which is also above the world average. It is 70 μSvy −1 for the world average. The values obtained in this study are well above the world average annual effective dose level for outdoor environments, which is an indication of radiological contamination.
The average of cancer risk for adults in the region of monitoring points around the FDNPP was exposed to the outdoor gamma dose rate, which was measured as 9.96 × 10 -3 . This value was found to be higher than the average world standard of 0.29 × 10 -3 , suggesting that individuals exposed to this radiation may develop cancer during their lifetime due to the ionization of tissues [46], regarding this point, most recently studies also reported the thyroid cancer relation with the air doses in the Fukushima prefecture [59,60]. In Tables 3  and 4, the hazard indices are shown according to the test and ARIMA model data for (1st January 2019-1st October 2019) for all monitoring points, respectively. The results for the hazard indices for all monitoring points of the test data and ARIMA models compared, as a result it can be noticed that all the results are in good agreement with each other.

Conclusions
We conclude the main findings of the study as follows: 1. ARIMA model forecasted data recommend that all the stationarity tests should be examined for the case of the time-series stationarity check, and support the idea that the BIC and AIC are the most valuable paths to deter- . These changes demonstrate that the air dose rates had reduced significantly during the period studied, to approximately 75%, 78%, 76%, and 82% of their initial respective values. 4. On the contrary, the forecasted air dose rates suggest that the radiation level can be remained higher at these areas in the northwest direction from the FDNPP compared with north and south regions.

5.
The results obtained for the absorbed dose rate, outdoor annual effective dose equivalent, and excess lifetime cancer risk are found to be above the globally permissible limits. As the average lifetime cancer risk for adults is considerably higher than the world average as 9.96 × 10 -3 , this shows that in the future there is a need for wider risk analysis related to environmental radioactivity in the study region.