Mapping Indoor Radon Concentrations in Chungcheongbuk-do, South Korea: A Geospatial Analysis using Machine Learning Models

doi:10.21203/rs.3.rs-4134332/v1

Download PDF

Research Article

Mapping Indoor Radon Concentrations in Chungcheongbuk-do, South Korea: A Geospatial Analysis using Machine Learning Models

https://doi.org/10.21203/rs.3.rs-4134332/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Radon is a naturally occurring radioactive gas found in many terrestrial materials. Due to the potential health risks linked to persistent exposure to high radon concentrations, it is essential to investigate indoor radon accumulation. This study generated indoor radon index maps for Chungcheongbuk-do, South Korea, selected factors with frequency ratios (FRs) and validated them using the FR, convolutional neural network, long short-term memory, and group method of data handling machine learning models. The establishment of a geospatial database provided a basis for the integration and analysis of indoor radon concentrations (IRCs), along with relevant geological, soil, topographical, and geochemical data. The study calculated the correlations between IRC and diverse factors statistically. The IRC potential was mapped for Chungcheongbuk-do by applying the above techniques, to assess the potential radon distribution. The robustness of the validated model was assessed using the area under the receiver operating curve.

Indoor radon concentration (IRC)

IRC potential mapping

Machine learning

convolutional neural networks

long short-term memory

group method of data handling

Radon, a naturally occurring radioactive gas, poses a major health risk when accumulated indoors. Exposure to radon gas is primarily associated with an increased risk of lung cancer (Mezquita et al. 2019; Lorenzo-González et al. 2019; Riudavets et al. 2022). Other health issues linked to radon exposure include respiratory problems such as potential aggravation of asthma (Mukharesh et al. 2022; Banzon et al. 2023). In enclosed structures such as residential and public structures such as residential and public buildings, imperceptible radon gas infiltrates from the ground, potentially reaching hazardous concentrations (Curado et al. 2019). The South Korean Indoor Air Quality Control in Public Use Facilities Act and U.S. Environmental Protection Agency both recommend an upper limit of 148 Bq/m³. A thorough comprehension of indoor radon accumulation is essential for evaluating and effectively reducing the health risks linked to its existence in indoor environments.

The likelihood of radon being present and its distribution on radon potential maps is intricately linked to many geological factors (Friedmann and Gröller 2010; Ciotoli et al. 2017; Haneberg et al. 2020), geochemical concentrations (Cho et al. 2015; Cho and Choo 2019), soil conditions (Panahi et al. 2022), topography (Szabó et al. 2014; Hwang et al. 2017), and climate (Petermann et al. 2021). Geological settings, such as the presence of uranium-bearing rocks and soils, can significantly affect radon levels due to the radioactive decay of uranium isotopes. Geochemical variation in the soil can also affect radon emanation rates, with higher radium and uranium concentrations leading to increased radon emissions. Soil conditions play a crucial role in radon migration and accumulation, as soil permeability and moisture levels can influence the transport of radon gas into indoor spaces. Topography, including elevation and slope, can also affect radon entry pathways into buildings, and indoor radon concentrations (IRCs). Climate factors, such as temperature and humidity, can influence radon levels by affecting soil gas movement and the ventilation of indoor spaces. Changes in atmospheric pressure and temperature gradients can also influence radon transport. These interconnected factors underscore the complexity of radon distribution and the importance of considering multiple variables in radon risk assessment and mapping efforts.

Several recent studies using geospatial technologies and machine learning algorithms have shown the importance of statistical approaches for developing radon potential maps (Friedmann and Gröller 2010). Misclassification has the potential to affect the severity of risks and undermine the reliability of indoor radon maps (Selamat et al. 2021). These studies have used geospatial approaches (Ciotoli et al. 2017; Coletti et al. 2022), spatial data on the frequency ratio (FR) (Hwang et al. 2017), support vector machines, multivariate adaptive regression splines, random forest models (Petermann et al. 2021), recurrent neural networks, long short-term memory (LSTM) (Panahi et al. 2022), extreme learning machines, and random vector functional link algorithms (Rezaie et al. 2021). Machine learning is a powerful tool for solving complicated challenges involving several variables, and is effective for handling missing data, noise, and big data processing (L’Heureux et al. 2017; Zhou et al. 2017).

Indoor radon conditions in South Korea are influenced by various factors. Studies have highlighted the prevalence of elevated indoor radon levels in some regions of South Korea, pointing to the significance of geological factors like lithology, faults, and lineament in radon emissions (Park et al. 2019b; Rezaie et al. 2023). Other studies of indoor radon in South Korea have emphasized the roles of the type of house, building materials and walls, and the resident’s ventilation habits in exacerbating indoor radon accumulation, particularly in urban areas with limited air circulation (Park et al. 2018, 2019a). Collectively, these studies underscore the complex interplay of geological, structural, and environmental factors that contribute to indoor radon levels in South Korea, emphasizing the importance of targeted mitigation strategies and regulatory measures to address radon exposure risks in residential and commercial buildings.

Radon exposure is a public health concern in South Korea, prompting the need for robust radon management strategies (Yoon et al. 2016). This study focused on predicting the indoor radon distribution in Chungcheongbuk-do by examining the interplay between IRC and environmental factors such as geology, geochemistry, soil condition and terrain characteristics. Through the application of FR analysis and machine learning methods, this study aimed to visualize the radon potential to lay the groundwork for a comprehensive radon management plan. Given the susceptibility South Korea to radon exposure, the development of scientifically informed radon index maps is important for guiding national environmental decision-making and assessing regional radon exposure. By establishing a spatial radon database and using advanced spatial analysis techniques underpinned by machine learning, this study seeks to provide a rigorous scientific basis for radon management, facilitating public safety and environmental protection.

2.1 Materials

2.1.1 Study area

Chungcheongbuk-do is located in the central part of South Korea, 127°29'57"E, 36°38'30"N (Fig. 1); it is bordered by Gyeonggi-do to the north, Gangwon-do to the east, Gyeongsangbuk-do to the southeast, and Jeollabuk-do to the southwest. It encompasses three cities and eight counties with a total population of 1,597,709 living in 703,916 households across an area of 7,407.7 km² (Chungcheongbuk-do Government 2024). The geography of the region features diverse topography, including the Sobaek and Worak Mountains, plains and river valleys. The soil composition varies throughout the province, with fertile soils in the plains supporting agricultural activities. Geologically, Chungcheongbuk-do has a mix of formations influenced by its mountainous landscapes and proximity to fault lines, contributing to its geological diversity and potential geological impact on the area. Chungcheongbuk-do has a temperate climate with distinct seasons characterized by hot humid summers and cold dry winters. This area is known for its high agricultural yield and wide variety of plant types.

The National Institute of Environmental Science, South Korea conducted comprehensive indoor radon measurements nationwide, while studying the associated influencing factors, to support the development of radon management plans by local governments. It integrated of indoor radon measurements by the National Institute of Environmental Science, emphasizing specific factors to analyze the IRC distribution map. For multi-media radon data, discrepancies in coordinate values and missing radon data were identified and rectified, ensuring the accuracy and reliability of the segmented radon data.

Table 1 gives detailed descriptions of geological, soil, geochemical, and topographic factors that can affect radon levels. Data related to radon were collected and Chungcheongbuk-do analyzed using different methods. The geological data were from 1:250,000 scale maps of the Korea Institute of Geoscience and Mineral Resources. Soil properties were from the Korean National Institute of Agricultural Sciences. Geochemical mapping used data from the Korea Institute of Geoscience and Mineral Resources, while topographic mapping was conducted using the SAGA GIS program with digital elevation model inputs from the National Geographic Information Institute. After collecting and analyzing data, factors that were correlated were found using the FR; these factors are listed in Table 1.

Table 1

Data sources
Data sources	Type of factors	Factors
Korea Institute of Geoscience and Mineral Resources (KIGAM), Korea	Geological	Lithology
National Institute of Agricultural Sciences, Korea	Soil Characteristics	Soil drainage
		Material
		Soil depth
		Deep soil texture
		Surface soil texture
Korea Institute of Geoscience and Mineral Resources	Geochemical	CaO
Korea Institute of Geoscience and Mineral Resources	Geochemical	Sr
National Geographic Information Institute, South Korea	Topographic	Slope
		TWI
		Wind exposition
		Valley depth
		LS factor

Assessing indoor radon levels entails the consideration of various factors influencing its accumulation within enclosed environments. Geological attributes, such as lithology characteristics, contribute to the radon concentration (Cho et al. 2015; Ciotoli et al. 2017). Radon levels in indoor environments are primarily affected by the local geology. Figure 2 provides a detailed overview of the lithological characteristics in Chungcheongbuk-do, which is characterized by widespread Daebo granite, a Jurassic granite (Jgr) recognized for its impact on radon levels (Cho and Choo 2019).

The soil factors include deep texture, drainage, material, surface texture, and thickness, (Fig. 3). These play roles in the movement of radon from the soil into indoor spaces and variation in the radon concentrations. Radon risk can be determined by amalgamating the radon potential associated with these soil properties; high correlations are associated with elevated radon levels (Cosma et al. 2013a; Dicu et al. 2023).

The geochemical data, which can influence the movement of radon indoors, were acquired in a survey that considered the presence of calcium oxide (CaO) and strontium (Sr) (Fig. 4). Geochemical maps were generated using spatial analysis tools with interpolation approaches. CaO and Sr are pertinent indicators in the geochemical composition of the soil (Liu et al. 2013; Wen et al. 2020) and their consideration is imperative for analyzing IRC maps (Cho et al. 2015).

Topographic factors (Fig. 5) are another facet of IRC analysis. Slope, the topographic wetness index (TWI), wind exposition, valley depth, and slope length and the steepness (LS) factor collectively contribute to how the surrounding landscape influences radon entry into buildings, influencing radon migration pathways and concentration variation (Cho et al. 2015; Rezaie et al. 2023).

2.2 Methodology

This plotted an IRC map for Chungcheongbuk-do using the FR and machine learning approaches, including group method of data handling (GMDH), convolutional neural network (CNN), and long short-term memory (LSTM). The model prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUROC). Figure 6 shows the methodological steps used in this study.

2.2.1 Identifying influencing factors of IRC variation using the FR

The FR is used to identify potential statistical relationships between a given phenomenon and its associated variables (Lee and Talib 2005). This study determined the FR of variables related to IRC and categorized them based on each class. In the correlation analysis, FR is the percentage of the area studied represented by IRC. The presence or absence of radon must first be defined and are set to 0 and 1 based on a threshold value of the indoor radon index of 148 Bq/m³ to indicate areas with high and low radon concentrations. For indoor radon, FR was calculated using 1 for the indoor radon index in Chungcheongbuk-do. The FR approach was used to evaluate the correlation between the indoor radon index and various factors and was calculated by dividing the area related to each IRC variable in a subclass and the entire study area within that subclass, as shown in Eq. 1 (Huang et al. 2020).

$$FR=\frac{{N}_{\left({I}_{i}\right)}∕{N}_{\left({F}_{i}\right)}}{{N}_{\left(I\right)}∕{N}_{\left(A\right)}}$$

where ${N}_{\left({I}_{i}\right)}$is the pixels of the IRC in subclass i of the factor; ${N}_{\left({F}_{i}\right)}$ is the total number of pixels in subclass i; ${N}_{\left(I\right)}$ is the total IRC distribution of the factor; and ${N}_{\left(A\right)}$ is the total area.

2.2.2 Model description

Frequency ratio

The FR model is a statistical model that is used to analyze the frequency of events or patterns in a dataset. The IRC map was developed using the FR approach, using the weighted sum tool for an integrated analysis to generate the FR map (Jana et al. 2019). The weighted sum tool allows strategic weighting and the amalgamation of various factors to produce and IRC map.

Group method of data handling

The GMDH is a robust approach to mathematical modeling and data analysis developed by Alexey G. Ivakhnenko in the 1970s that has been applied in various fields (Ivakhnenko 1970). The GMDH algorithm uses a self-organization principle to identify the optimal model complexity by systematically evaluating numerous models that meet the specified criteria (Ivakhnenko 1978). The GMDH algorithm consists of multiple functions that effectively handle several issues and enhance the precision of problem-solving outcomes. The functions include linear, polynomial, and ratio-polynomial variations (Ivakhnenko and Ivakhnenko 2000). The relationship between input and output variables can be described by a complex discrete form of the Volterra functional series, commonly referred to as the Kolmogorov-Gabor polynomial (Farlow 1984). Eq. 2 gives the correlation between the input and output variables of the model. GMDH starts with a set of input data that includes multiple variables. Each variable represents a feature or attribute of the dataset. The variables are grouped into different sets or layers. Within each group or layer, mathematical models are developed to represent the relationships among variables. The process continues until a predefined stop criterion is met or until the best possible model is achieved. The resulting model can be used for making predictions on new data.

$$y={a}_{0}+{\varSigma }_{i=1}^{n}{a}_{i}{x}_{1}+{\varSigma }_{i=1}^{n}{\varSigma }_{j=1}^{n}{a}_{1}{x}_{1}{x}_{j}+{\varSigma }_{1=1}^{n}{\varSigma }_{J=1}^{n}{\varSigma }_{{k}_{1}}^{n}{a}_{1}{x}_{1}{x}_{J}{x}_{2}+\dots$$

Where $y$ is the prediction result; $x$ is the input variables vector; $a$ is the coefficient calculated using the least squares error approach; and $n$ denotes the number of input variables.

Convolutional neural network

A CNN is a deep-learning algorithm that belongs to the broader category of machine learning approaches. Deep learning is a specialized area of machine learning that emphasizes the use of artificial neural networks that include a layer committed to the convolution operation. The fundamental architecture of the CNN model consists of convolution, pooling, and fully connected layers (Lecun et al. 1998; Yamashita et al. 2018).

The input layer serves as the initial interface for raw data, converting it into a numerical form, often expressed as a tensor, which is a multi-dimensional data array. The core part of CNN is a convolutional layer functions, where convolution operations are performed on the inputted numerical data through a set of adaptable filters, also known as kernels (Thi Ngo et al. 2021). Upon completion of the convolution operations, an activation function is employed on a per-element basis to the resultant data from the convolutional layer. This function injects non-linearity into the computational model, which is essential for the network’s ability to capture and represent complex data relationships (Berkani et al. 2023).

Pooling operations contribute to the reduction of computational demand and bolster the model's invariance to shifts in position. By downsampling the feature maps from the convolutional layers, the pooling layer condenses the dimensionality of the data (Barata et al. 2019). General techniques include max pooling, which isolates the highest value within a designated subsection, and average pooling, which determines the sectional average (Chawshin et al. 2022).

The fully connected layer plays a crucial role in rendering predictions by integrating the high-level features derived from antecedent layers (Ma et al. 2021). Meanwhile, the output layer in a CNN, responsible for producing the ultimate output, generally comprises neurons that align with each of the defined categories.

Long short-term memory

LSTM is another type of deep neural network algorithm in which the network output is fed back into the network as subsequent input (Kong et al. 2019). The structural framework of LSTM networks excels in identifying intricate spatial configurations and temporal trends across diverse settings. LSTM networks are structured around unique memory units that incorporate gating mechanisms, which enable the model to preserve and modify data over extended sequences, ensuring crucial information is not discarded.

The key components of an LSTM cell include input gates, forget gates, cells, output gates, and cell outputs state (Graves 2012). The input gate is responsible for managing how new data is assimilated into the cell state, effectively serving as a filter for incoming information. The forget gate critically assesses which parts of the stored information are no longer pertinent and thus should be removed, taking into account both the new input and the preceding hidden state. Concurrently, the candidate cell state, poised with potential information for addition to the cell state, hinges on the decisions made by both the input and forget gates. Following this, the cell's output state, a key component of the LSTM's memory architecture, is refreshed with this vetted information, thereby encapsulating the crux of the input sequence at that specific moment. The output gate's role is pivotal in dictating the quantum of updated information that is conveyed from the cell state, thereby modulating the balance between memory retention and attrition across time steps. This mechanism of selective information management within the LSTM's memory cells facilitates the maintenance of a continuous and relevant data stream across prolonged sequences, effectively mitigating the challenge of information degradation over extended periods.

2.2.3 Model performance assessment using AUROC

The AUROC powerful metric for evaluating the accuracy of predictions made by machine learning models. The AUROC method was used to validate the indoor radon index potential maps. For validation, training (70% of the total radon data) and testing (30% of the total radon data) were selected randomly to generate radon index maps using the training data. In the training and testing phase, the AUROC can be used as a performance indicator to evaluate the efficacy of the FR and machine learning algorithms (Bradley 1997). The ROC expresses the predictive index value obtained from the prediction map as the ratio of the location of radon data to the total area. The x-axis of the graph shows the upper percentage of areas with a high indoor radon index and the y-axis shows the lower percentage of areas with training or testing radon data (Pencina et al. 2008). The model's predictive performance is categorized into five ranges based on the AUC: fail (0.5–0.6), poor (0.6–0.7), fair (0.7–0.8), good (0.8–0.9), and excellent (0.9–1.0) (Carter et al. 2016). The AUROC is used to assess model performance in many fields, such as flood susceptibility maps (Dodangeh et al. 2020), groundwater potential (Panahi et al. 2020), habitat potential maps (Widya et al. 2023), and radon distribution maps (Rezaie et al. 2022).

3.1 Analysis of influential factors

In this study, correlations were calculated using the IRC with various geological, soil, geochemical, and topographic factors using databases established to produce radon index maps. The FR approach was used to select and evaluate the correlations between the indoor radon index and the different factors. Finally, 13 factors were selected to be used for the indoor radon index map. Figure 7 illustrates the FR of the geological factor lithology class. In the lithology class, the subclasses Jgr and CEOyls have high FR values of IRC 1.489 and 1,717, respectively.

Figure 8 shows the FR values of soil factors including deep texture, drainage, material, surface texture, and thickness. The clay loam subclass within the soil deep texture class had a high FR of 1.156. The slightly good subclass within the soil drainage class had the highest FR of 2.152. The acidic rock subclass of soil material dominated the indoor radon index with an FR of 1.156. The indoor radon index was mostly influenced by the sandy loam subclass of soil surface texture, with an FR of 1.156. The thickness of the > 100 cm soil thickness subclass had a high FR of 3.415.

For indoor radon index distribution, the Sr geochemistry was related to the patterns of indoor radon occurrence. As depicted in Fig. 9, CaO and Sr influenced the indoor radon index map. Notably, the CaO class with the subclass range of 3.173–47.787 had a high FR of 1.323, which affected the indoor radon index distribution. The 217.101–974.998 subclass within the Sr class had a high FR of 1.607.

The results of the topographic analysis, that are illustrated in Fig. 10, summarize of the variables that contributed to the indoor radon index distribution. Analyzing each subclass of each topographic parameter showed notable patterns. The slope factor within the range of 0–4.773 had an FR value of 2.676, and influenced indoor radon index levels. Similarly, locations characterized by the higher TWI subclass (9.067–24.236) had an FR of 1.936, indicating an impact on the indoor radon distribution. Furthermore, areas with lower wind exposition (0.745–0.901) had a high FR of 3.072. The Valley depth in the range of 132.451–380.001 had an FR value of 2.391. Areas with lower LS Factor values (0.665–3.994) had an FR of 2.238.

3.2 Potential IRC map

Incorporating a comprehensive set of factors, our study used four distinct modeling approaches to analyze indoor radon levels in Chungcheongbuk-do, including probability models, FRs, and sophisticated LSTM, CNN, and GMDH machine learning techniques to generate a suite of radon index maps. Each map assigned an index value to every 10 × 10 m grid segment, categorizing the radon concentration into high, moderate, and low. To enhance their interpretability, the maps represented high, moderate, and low levels in red, blue, and yellow, respectively, red areas constituting 10% of the total, blue 20%, and yellow 70%. The detailed IRC map of the indoor radon index across Chungcheongbuk-do illustrated in Fig. 11.

3.3 Model validation

The model prediction was validated by a thorough review of AUC metrics. In the training phase, the AUC values for the FR, CNN, LSTM, and GMDH, models were 90%, 89%, 90%, and 82%, respectively (Table 2). The calculations represent the ability the models to recognize and understand patterns in the training dataset. In the next validation stage using the testing dataset, the FR, CNN, LSTM, and GMDH models had respective AUCs of 88%, 86%, 88%, and 81%. The AUC values obtained during testing are essential criteria that demonstrate the power of the models to predict IRC maps. The AUC metrices calculated in this study are shown in Table 2 and have accuracies generally exceeding 80%, indicating that the index maps are valid.

Table 2

The AUROC values of FR, CNN, LSTM and GMDH
Train/Test	Model	Indoor Radon
Train Data	FR	90%
	CNN	89%
	LSTM	90%
	GMDH	82%
Test Data	FR	88%
	CNN	86%
	LSTM	88%
	GMDH	81%

Identifying influencing factors is the initial step for mapping the indoor radon distribution. This study used the information gain ratio (IGR) to identify the quantitative predictive power of the factors influencing prediction (Abedini et al. 2019). The IGR approach enhances our comprehension of the effects of diverse environmental and architectural attributes on IRCs. As shown in Table 3, this study used the IGR technique to identify 13 factors that affect IRCs in Chungcheongbuk-do: lithology, soil deep texture, soil drainage, soil material, soil surface texture, and soil thickness, CaO, Sr, topographic slope, TWI, wind exposition, valley depth, and LS factor. Lithologically, Daebo Granite, which is common in the center of the Korean Peninsula, dominates Chungcheongbuk-do (Cho and Choo 2019). As shown in Fig. 2 and the FR analysis, Daebo Granite groundwater has high radon levels (Cho et al. 2019). A previous study developed models for geogenic radon that integrated soil hydraulic properties, soil physical characteristics, uranium concentrations, and chemical properties, the SAGA wetness index, and climate data (Petermann et al. 2021).

Table 3

The IGR value for each factors.
Classes	IGR
Lithology	0.72
Soil thickness	0.697
Soil surface texture	0.694
Soil depth texture	0.779
Soil drainage	0.788
Soil material	0.808
Sr	0.256
Cao	0.135
LS factor	0.806
Valley depth	0.419
Wind exposition	0.878
TWI	0.673
Slope	0.89

We applied several machine learning algorithms and meticulously assessed their performance metrics to determine their efficacy in predictive modeling. The LSTM algorithm performed better than FR, CNN, and GMDH models when assessed using the AUROC. This aligns with previous research highlighting the robustness of LSTM at capturing the attribute information of conditioning factors and powerful sequential modelling capabilities to handle the spatial relationship. The capabilities of the FR, CNN, and GMDH models were also commendable on the AUROC values. This disparity underscores the significance of algorithm selection based on the specific nature of the dataset and the desired model outcome. However, the GMDH result was low compared to others. Noted that GMDH has limitations and may be sensitive to noise in the data. The success of GMDH depends on careful parameter tuning and appropriate validation techniques to ensure the reliability of the developed models.

The use of GMDH, CNN, and LSTM machine learning algorithms offers numerous benefits and tackles obstacles in creating IRC maps. Machine learning approaches substantially improve the predictive accuracy of the distribution of indoor radon levels. By proactively identifying areas with high radon concentrations, residents can implement appropriate preventative steps. Radon exposure can be minimized by installing radon mitigation or improving the construction of houses or buildings. The development of indoor radon maps has advantages for public health, environmental monitoring, and urban planning. However, these algorithms also have limitations. The datasets used to generate radon potential maps might be limited at capturing the impact of various factors on radon levels. These factors include climate change (Petermann et al. 2021) and specific building characteristics such as the number of floors, window types, presence of elevators or basements, building materials, heating systems, renovations, room types, construction period, and ventilation systems, all of which can affect radon concentrations (Cosma et al. 2013b; Ivanova et al. 2017). Based on this study, further in-depth research is needed to optimize the factors needed for future standardized radon index mapping.

This study showed that it is the imperative to investigate IRC distributions by mapping radon concentrations. Using sophisticated machine learning techniques such as CNN, LSTM, and GMDH, the study successfully generated and validated IRC maps for Chungcheongbuk-do, South Korea. We analyzed the factors affecting IRCs by combining various geological, soil, topographical, and geochemical datsets. Comprehensive IRC distribution maps were generated using statistical approaches and machine learning algorithms. Validation using ROC curve analysis shows the reliability and robustness of the developed models. This study enhances our comprehension of indoor radon distributions and lenables recommendations for future geospatial research and the development of effective mitigation plans, especially in radon-prone areas such as Chungcheongbuk-do, South Korea.

Acknowledgements:

Author contribution: Liadira Kusuma Widya: Writing – original draft, writing-review and editing, Methodology, Visualization; Fatemeh Rezaie: Formal analysis, writing-review and editing, Methodology, Validation, Visualization; Jungsub Lee: Data curation , writing-review and editing, Jongchun Lee: Data curation, writing-review and editing, Juhee Yoo: Data curation, writing-review and editing, Woojin Lee: Methodology, writing-review and editing, Saro Lee: Conceptualization, Supervision, Funding acquisition, Project administration;

Funding: This research was supported by the Basic Research Project of the Korea Institute of Geoscience and Mineral Resources (KIGAM) and the National Research Foundation of Korea (NRF) grant funded by Korea government (MSIT) (No. 2023R1A2C1003095). Also, this research was commissioned by the National Institute of Environmental Research (NIER) based on the fund by the Ministry of Environment (MOE) of the Republic of Korea (NIER-2019-03-01-012).

Ethics approval Not applicable.

Consent to participate Not applicable.

Consent for publication Not applicable.

Conflict of interest The authors declare no competing interests.

Abedini M, Ghasemian B, Shirzadi A, Bui DT (2019) A comparative study of support vector machine and logistic model tree classifiers for shallow landslide susceptibility modeling. Environ Earth Sci 78:560. https://doi.org/10.1007/s12665-019-8562-z
Banzon TM, Greco KF, Li L et al (2023) Effect of radon exposure on asthma morbidity in the School Inner-City Asthma study. Pediatr Pulmonol 58:2042–2049. https://doi.org/10.1002/ppul.26429
Barata F, Kipfer K, Weber M et al (2019) Towards device-agnostic mobile cough detection with convolutional neural networks. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, pp 1–11
Berkani S, Guermah B, Zakroum M, Ghogho M (2023) Spatio-temporal forecasting: A survey of data-driven models using exogenous data. IEEE Access 11:75191–75214. https://doi.org/10.1109/ACCESS.2023.3282545
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
Carter JV, Pan J, Rai SN, Galandiuk S (2016) ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves. Surgery 159:1638–1645. https://doi.org/10.1016/j.surg.2015.12.029
Chawshin K, Berg CF, Varagnolo D, Lopez O (2022) Automated porosity estimation using CT-scans of extracted core data. Comput Geosci 26:595–612. https://doi.org/10.1007/s10596-022-10143-9
Cho BW, Choo CO (2019) Geochemical behavior of uranium and radon in groundwater of Jurassic Granite Area, Icheon, Middle Korea. Water (Basel) 11:1278. https://doi.org/10.3390/w11061278
Cho B-W, Choo CO, Kim MS et al (2015) Spatial relationships between radon and topographical, geological, and geochemical factors and their relevance in all of South Korea. Environ Earth Sci 74:5155–5168. https://doi.org/10.1007/s12665-015-4526-0
Cho BW, Kim HK, Kim MS et al (2019) Radon concentrations in the community groundwater system of South Korea. Environ Monit Assess 191:189. https://doi.org/10.1007/s10661-019-7301-y
Chungcheongbuk-do Government (2024) Administrative districts. In: https://www.chungbuk.go.kr/wwweng/index.do
Ciotoli G, Voltaggio M, Tuccimei P et al (2017) Geographically weighted regression and geostatistical techniques to construct the geogenic radon potential map of the Lazio region: A methodological proposal for the European Atlas of Natural Radiation. J Environ Radioact 166:355–375. https://doi.org/10.1016/j.jenvrad.2016.05.010
Coletti C, Ciotoli G, Benà E et al (2022) The assessment of local geological factors for the construction of a Geogenic Radon Potential map using regression kriging. A case study from the Euganean Hills volcanic district (Italy). Sci Total Environ 808:152064. https://doi.org/10.1016/j.scitotenv.2021.152064
Cosma C, Cucoş-Dinu A, Papp B et al (2013a) Soil and building material as main sources of indoor radon in Băiţa-Ştei radon prone area (Romania). J Environ Radioact 116:174–179. https://doi.org/10.1016/j.jenvrad.2012.09.006
Cosma C, Cucoş-Dinu A, Papp B et al (2013b) Soil and building material as main sources of indoor radon in Băiţa-Ştei radon prone area (Romania). J Environ Radioact 116:174–179. https://doi.org/10.1016/j.jenvrad.2012.09.006
Curado A, Silva JP, Lopes SI (2019) Radon risk analysis in a set of public buildings in Minho region, Portugal: from short-term monitoring to radon risk assessment. Procedia Struct Integr 22:386–392. https://doi.org/10.1016/j.prostr.2020.01.048
Dicu T, Cucoş A, Botoş M et al (2023) Exploring statistical and machine learning techniques to identify factors influencing indoor radon concentration. Sci Total Environ 905:167024. https://doi.org/10.1016/j.scitotenv.2023.167024
Dodangeh E, Panahi M, Rezaie F et al (2020) Novel hybrid intelligence models for flood-susceptibility prediction: Meta optimization of the GMDH and SVR models with the genetic algorithm and harmony search. J Hydrol (Amst) 590:125423. https://doi.org/10.1016/j.jhydrol.2020.125423
Farlow SJ (1984) Self-organizing method in modeling: GMDH type algorithm. Marcel Dekker Inc., New York
Friedmann H, Gröller J (2010) An approach to improve the austrian radon potential map by bayesian statistics. J Environ Radioact 101:804–808. https://doi.org/10.1016/j.jenvrad.2009.11.008
Graves A (2012) Long Short-Term Memory. Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin, Heidelberg, pp 31–38
Haneberg WC, Wiggins A, Curl DC et al (2020) A geologically based indoor-radon potential map of Kentucky. https://doi.org/10.1029/2020GH000263. Geohealth 4:
Huang F, Cao Z, Guo J et al (2020) Comparisons of heuristic, general statistical and machine learning models for landslide susceptibility prediction and mapping. Catena (Amst) 191:104580. https://doi.org/10.1016/j.catena.2020.104580
Hwang J, Kim T, Kim H et al (2017) Predictive radon potential mapping in groundwater: a case study in Yongin, Korea. Environ Earth Sci 76:515. https://doi.org/10.1007/s12665-017-6838-8
Ivakhnenko AG (1970) Heuristic self-organization in problems of engineering cybernetics. Automatica 6:207–219. https://doi.org/10.1016/0005-1098(70)90092-0
Ivakhnenko AG (1978) The group method of data handling in long-range forecasting. Technol Forecast Soc Change 12:213–227. https://doi.org/10.1016/0040-1625(78)90057-4
Ivakhnenko AG, Ivakhnenko GA (2000) Problems of further development of the group method of data handling algorithms. Part I, mathematical theory of pattern recognition. Pattern Recognizition Image Anal 10(2):187–194
Ivanova K, Stojanovska Z, Tsenova M, Kunovska B (2017) Building-specific factors affecting indoor radon concentration variations in different regions in Bulgaria. Air Qual Atmos Health 10:1151–1161. https://doi.org/10.1007/s11869-017-0501-0
Jana SK, Sekac T, Pal DK (2019) Geo-spatial approach with frequency ratio method in landslide susceptibility mapping in the Busu River catchment, Papua New Guinea. Spat Inform Res 27:49–62. https://doi.org/10.1007/s41324-018-0215-x
Kong W, Dong ZY, Jia Y et al (2019) Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network. IEEE Trans Smart Grid 10:841–851. https://doi.org/10.1109/TSG.2017.2753802
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86:2278–2324. https://doi.org/10.1109/5.726791
Lee S, Talib JA (2005) Probabilistic landslide susceptibility and factor effect analysis. Environ Geol 47:982–990. https://doi.org/10.1007/s00254-005-1228-z
L’Heureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: Challenges and approaches. IEEE Access 5:7776–7797. https://doi.org/10.1109/ACCESS.2017.2696365
Liu W-J, Liu C-Q, Zhao Z-Q et al (2013) Elemental and strontium isotopic geochemistry of the soil profiles developed on limestone and sandstone in karstic terrain on Yunnan-Guizhou Plateau, China: Implications for chemical weathering and parent materials. J Asian Earth Sci 67–68:138–152. https://doi.org/10.1016/j.jseaes.2013.02.017
Lorenzo-González M, Torres-Durán M, Barbosa-Lorenzo R et al (2019) Radon exposure: a major cause of lung cancer. Expert Rev Respir Med 13:839–850. https://doi.org/10.1080/17476348.2019.1645599
Ma Z, Mei G, Piccialli F (2021) Machine learning for landslides prevention: a survey. Neural Comput Appl 33:10881–10907. https://doi.org/10.1007/s00521-020-05529-8
Mezquita L, Benito A, Ruano-Raviña A et al (2019) Indoor radon in EGFR- and BRAF-mutated and ALK-rearranged non–small-cell lung cancer patients. Clin Lung Cancer 20:305–312e3. https://doi.org/10.1016/j.cllc.2019.04.009
Mukharesh L, Greco KF, Banzon T et al (2022) Environmental radon and childhood asthma. Pediatr Pulmonol 57:3165–3168. https://doi.org/10.1002/ppul.26143
Panahi M, Sadhasivam N, Pourghasemi HR et al (2020) Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J Hydrol (Amst) 588:125033. https://doi.org/10.1016/j.jhydrol.2020.125033
Panahi M, Yariyan P, Rezaie F et al (2022) Spatial modeling of radon potential mapping using deep learning algorithms. Geocarto Int 37:9560–9582. https://doi.org/10.1080/10106049.2021.2022011
Park JH, Lee CM, Kang DR (2019a) A deterministic model for estimating indoor radon concentrations in South Korea. Int J Environ Res Public Health 16:3424. https://doi.org/10.3390/ijerph16183424
Park N-W, Kim Y, Chang B-U, Kwak G-H (2019b) County-level indoor radon concentration mapping and uncertainty assessment in South Korea using geostatistical simulation and environmental factors. J Environ Radioact 208–209:106044. https://doi.org/10.1016/j.jenvrad.2019.106044
Park TH, Kang DR, Park SH et al (2018) Indoor radon concentration in Korea residential environments. Environ Sci Pollut Res 25:12678–12685. https://doi.org/10.1007/s11356-018-1531-3
Pencina MJ, D’Agostino RB, D’Agostino RB, Vasan RS (2008) Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med 27:157–172. https://doi.org/10.1002/sim.2929
Petermann E, Meyer H, Nussbaum M, Bossew P (2021) Mapping the geogenic radon potential for Germany by machine learning. Sci Total Environ 754:142291. https://doi.org/10.1016/j.scitotenv.2020.142291
Rezaie F, Kim SW, Alizadeh M et al (2021) Application of Machine Learning Algorithms for Geogenic Radon Potential Mapping in Danyang-Gun, South Korea. Front Environ Sci 9. https://doi.org/10.3389/fenvs.2021.753028
Rezaie F, Panahi M, Bateni SM et al (2023) Spatial modeling of geogenic indoor radon distribution in Chungcheongnam-do, South Korea using enhanced machine learning algorithms. Environ Int 171:107724. https://doi.org/10.1016/j.envint.2022.107724
Rezaie F, Panahi M, Lee J et al (2022) Radon potential mapping in Jangsu-gun, South Korea using probabilistic and deep learning algorithms. Environ Pollut 292:118385. https://doi.org/10.1016/j.envpol.2021.118385
Riudavets M, Garcia de Herreros M, Besse B, Mezquita L (2022) Radon and lung cancer: current trends and future perspectives. Cancers (Basel) 14:3142. https://doi.org/10.3390/cancers14133142
Selamat FE, Tagusari J, Matsui T (2021) Mapping of transportation noise-induced health risks as an alternative tool for risk communication with local residents. Appl Acoust 178:107987. https://doi.org/10.1016/j.apacoust.2021.107987
Szabó KZ, Jordan G, Horváth Á, Szabó C (2014) Mapping the geogenic radon potential: methodology and spatial analysis for central Hungary. J Environ Radioact 129:107–120. https://doi.org/10.1016/j.jenvrad.2013.12.009
Thi Ngo PT, Panahi M, Khosravi K et al (2021) Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci Front 12:505–519. https://doi.org/10.1016/j.gsf.2020.06.013
Wen Y, Li W, Yang Z et al (2020) Enrichment and source identification of Cd and other heavy metals in soils with high geochemical background in the karst region, Southwestern China. Chemosphere 245:125620. https://doi.org/10.1016/j.chemosphere.2019.125620
Widya LK, Rezaie F, Lee S (2023) Mapping the potential distribution of raccoon dog habitats: Spatial statistics and optimized deep learning approaches. Proc Natl Inst Ecol Repub Korea 4:159–176
Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629. https://doi.org/10.1007/s13244-018-0639-9
Yoon JY, Lee J-D, Joo SW, Kang DR (2016) Indoor radon exposure and lung cancer: a review of ecological studies. Ann Occup Environ Med 28:15. https://doi.org/10.1186/s40557-016-0098-z
Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: Opportunities and challenges. Neurocomputing 237:350–361. https://doi.org/10.1016/j.neucom.2017.01.026

Download PDF

Editorial decision: Major Revision
16 Jul, 2024
Reviewers agreed at journal
25 Apr, 2024
Reviewers invited by journal
25 Apr, 2024
Editor invited by journal
04 Apr, 2024
Editor assigned by journal
22 Mar, 2024
First submitted to journal
19 Mar, 2024

You are reading this latest preprint version

Mapping Indoor Radon Concentrations in Chungcheongbuk-do, South Korea: A Geospatial Analysis using Machine Learning Models

Status:

Version 1

Abstract

Figures

1. Introduction

2. Materials and methods

2.1 Materials

2.1.1 Study area

2.2 Methodology

2.2.1 Identifying influencing factors of IRC variation using the FR

2.2.2 Model description

2.2.3 Model performance assessment using AUROC

3. Results

3.1 Analysis of influential factors

3.2 Potential IRC map

3.3 Model validation

4. Discussion

5. Conclusion

Declarations

References

Status:

Version 1