Application of GIS and feedforward back-propagated ANN models for predicting the ecological and health risk of potentially toxic elements in soils in Northwestern Nigeria

Potentially toxic elements (PTEs) occur naturally in most geologic materials. However, recent anthropogenic disturbances such as ore mining have contributed significantly to their enrichment in soils. Their occurrence in soil may portend a myriad of related risks to the environment and biota. Most traditional soil quality evaluation methods involve comparing the background values of the elements to the established guideline values, which is often time-consuming and fraught with computational errors. As a result, to conduct a comprehensive and unbiased evaluation of soil quality and its effects on the ecosystem and human health, this research combined geochemical, numerical, and GIS data for a composite health risk zonation of the entire study area. Furthermore, the multilayer perceptron artificial neural network (MLP-NN) was used to forecast the most important toxic components influencing soil quality. Geochemical, statistical, and quantitative soil pollution evaluation (pollution index and ecological risk index) showed that apart from mining, the spread and association of trace elements and oxides occur as a consequence of surface environmental conditions (e.g., leaching, weathering, and organo-metallic complexation). The hazard quotients (HQs) and hazard index (HI) of all PTEs were greater than one. This indicates that residents (particularly children) are more susceptible to risks from toxic element ingestion than dermal exposure and inhalation. Ingestion of As and Cr resulted in higher cancer risks and lifetime cancer risk levels (> 1.0E 04), with risk levels increasing toward the northeastern, western, and southeastern directions of the study area. The low modeling errors observed from the sum of square errors, relative errors, and coefficient of determination confirmed the efficiency of the MLP-NN in pollution load prediction. Based on the sensitivity analysis, Hg, Sr, Zn, Ba, As, and Zr showed the greatest influence on soil quality. Focus on remediation should therefore be placed on the removal of these elements from the soil.


Introduction
The soil is considered one of the most important environmental components; as it supports plant growth.However, due to the constant natural environmental shifts and varying anthropogenic activities (especially during the recent Anthropocene age), it has become a primary repository of contamination to other environmental components such as water, air, and plants (Chiang et al., 2011;Omeka et al., 2023;Kolawole et al., 2023aKolawole et al., , 2023b)).All geologic materials are composed of potentially toxic elements (PTEs), and their concentration in the soil is dependent on their source.PTEs can originate either naturally (through pedogenesis or lithogenesis) or through anthropogenic processes (such as unregulated agricultural activities, mining, and poor disposal of its effluents).PTEs are metals and metalloids that are persistent in the environment and are both physiologically essential [e.g., chromium (Cr), cobalt (Co), manganese (Mn), zinc (Zn), and copper (Cu)] and non-essential [e.g., cadmium (Cd), arsenic (As), mercury (Hg) and lead (Pb).These elements are needed in low concentrations for animal, human, and plant nutrition and hence are known as 'trace elements or 'micronutrients.Nonessential metals are zootoxic or phytotoxic and are commonly referred to as 'toxic elements'.At exceedingly high amounts, both groups are toxic to vegetation, animals, and/or people (Omeka & Igwe, 2021;Omeka et al., 2022a;Pourret & Hursthouse, 2019;Pourret et al., 2021).Due to their toxicity and persistence in the environment, PTEs have the propensity to maintain a high residence time in the geo-environment and serve as major contamination threats to valued environmental components (e.g., soil, water, and plants).As a result, unconventional methods that guarantee accurate evaluation, monitoring, and prediction of soil quality while providing the most effective remediation/mitigation measures for both human and natural protection are required (Chiang et al., 2011).
Among the hazards from the exposure of the environment to PTEs, hazards to human health appear to be the ones with direct and most significant impacts.The hazard's impact on human health largely depends on the route of exposure to the contaminant and the body weight of the human population size (Omeka & Egbueri, 2022).The exposure route of a particular contaminant considers factors such as inhalation (from dust within a mine area), and ingestion (directly from the consumption of unwashed fruits from the vicinity of the mine site).While PTE exposure through inhalation of dust may be more significant to mine workers, residents living within the vicinity of the mine are also at huge risk (US EPA 2017; Neris et al., 2019;Kolawole et al., 2022;Igwe & Omeka, 2022).The children around the vicinity of the mine through their playing habits can also be exposed to direct PTE ingestion.
Studies have reported health challenges occurring from the long-and short-term exposure of humans to PTEs through the different exposure routes.It has been agreed that heart diseases, lung and breast cancer, distortion of human fetal development, children's memory impairment, and human neural disorder are associated with the long-term intake of toxic elements (e.g., Cu, Pb, and Cd) (Khan et al., 2013;US-EPA, 2017).Both long-term ingestion and short-term ingestion of PTEs (e.g., As, Cr, Sr, and Hg) have been attributed to the illnesses such as kidney stones, anemia, and psychosis (Aghamelu et al., 2023;Okamkpa et al., 2022;Tchounwou et al., 2012;US-EPA, 2017).How these toxic elements occur in the soil phase and their contribution to environmental and health hazards will be mainly dependent on a myriad of complex and varying geochemical processes (such as bioavailability and chemical speciation) that are operational in such an environment.Hence, for a regular monitoring and robust assessment of the level of PTEs in terms of their ecological and health risks, it will be necessary to carry out an integrated highprecision modeling approach through the combination of numerical indices and GIS-based spatiotemporal models.This will ensure a visual appraisal of the ecological and health risks variation in contaminated sites.For adequate mitigation of contaminated sites, it will also be important to predict the most influential toxic elements impacting the soil quality.This can be achieved through the use of data-driven intelligent machine learning algorithms.
Although studies on soil quality assessment in the past have been based on legislative guidelines set up internationally and locally (CCME, 1999;DPR, 2002;EA, 2009;US-EPA, 2005), these statutory standards are founded on estimates of soil physical and chemical properties (for example, organic matter in soil content, pH, carbonates, capacity for cation exchange, and so on) as well as total metal content.The use of only the total metal content in soil quality assessment cannot provide a holistic assessment of the soil quality in terms of its effect on human health, ecology, and remediation of contaminated sites (Jamshidi- Zanjani et al., 2015;Azeez et al., 2023).This is because the assessment is usually associated with error and may lead to either underestimation or overestimation of the exposure to contamination.This will result in continuous soil assessment and monitoring which may be expensive and require high ex-pat knowledge (Jahromi et al., 2020;Jamshidi-Zanjani et al., 2015;Omeka et al., 2022b).Therefore, in this study, we have integrated several composite indexical and GIS-based spatiotemporal models to generate a composite ecological and health risk zonation map of the Tejan-Jatau metallogenic province in Northwestern Nigeria.Additionally, the most influential toxic elements impacting the soil quality were predicted using the multilayer perceptron artificial neural network (MLP-NN).Two multiple numerical environmental pollution indices [pollution load index (PLI) and ecological risk index (ERI)] and a composite health risk assessment modeling (for inhalation, ingestion, and dermal contact) were combined to have a visual appraisal of the ecological and health risk of the soils within the mine area by generating composite health risk zonation maps.
Humans are known to be the first receptors to PTEs exposure.These PTEs tend to pose both carcinogenic and non-carcinogenic effects on humans (Jahromi et al., 2020;Jamshidi-Zanjani et al., 2015;Kolawole et al., 2022;Olajide-Kayode et al., 2023;Unigwe et al., 2022).It becomes integral therefore to assess both the carcinogenic and non-carcinogenic effects of these elements on human health concerning their various exposure routes.To this end, multiple health risk indices including average daily dose inhalation (ADD inh ), average daily dose dermal contact (ADD der ), average daily dose ingestion (ADD ing ), hazard index (HI), hazard quotient (HQ), and lifetime cancer risk (LCR) were computed in this study to understand the potential effects of PTE exposure on the inhabitants within the mine area.Conversely, it has been established that soil pollution from PTE involves a series of complex and dynamic geochemical processes.Hence, adequate characterization and identification of these complexities in terms of contaminant source will serve as the first and basic approach toward remediation of the contaminant site.
Hence, our study has integrated multivariate statistical models (such as correlation analysis and principal component analysis) to understand the intricate relationships between the elements and their oxides in the soil system.
Previous literature has not addressed the research gap of whether it is possible to develop a combined soil quality zonation map from GIS using a joint multi-criteria method of environmental pollution indices and health risk indices.Furthermore, prior research has not recorded the prediction and identification of the most influential toxic elements affecting the quality of the soil through the application of dataintelligent machine learning algorithms.As a result, it is believed that creating a combined soil quality map by merging multiple health risk indices and predicting the most influential PTEs affecting soil quality is novel.Although, algorithms based on machine learning have been used in some studies in other parts of the world to forecast different soil physiochemical metrics such as soil organic carbon, moisture content, water content, nitrogen content, phosphorus content, soil aggregate (Acharya et al., 2021;Bouslihim et al., 2021;Dessureault-Rompre et al., 2015;Emadi et al., 2020;Keskin et al., 2019;Santra et al., 2018;Valadares et al., 2017).For instance, Bouslihim et al. (2021) integrated random forest (RF) machine learning algorithms to forecast the soil aggregate stability in soils from a watershed in Morocco by predicting the mean weight diameter as an indicator of soil aggregate stability.Emadi et al (2020) predicted the soil organic carbon in northern Iran by integrating multiple machine learning models such as RF, extreme gradient boosting (XGBoost), support vector machine learning (SVM), conventional deep neural network (DNN), multilayer-perceptron neural networks (MLP-NNs), and regression trees.In Canada, Acharya et al. (2021) predicted the soil moisture content from the Red River valley in North Dakota and Minnesota by combining various machine learning algorithms such as CART regression trees, RF, boosted regression trees (BRT), SVM, MLP-NN, and multiple linear regression (MLR).Conversely, in southeastern Nigeria, the detachability and liquefaction potential index (DLPI) have been predicted by integrating MLR and MLP-NN to forecast the erodibility potential of tropical soils (Egbueri et al., 2023).This model has also been successfully applied in combination with the analytical hierarchy process (AHP) for irrigation water quality prioritization and prediction in southeastern Nigeria (Omeka et al., 2023).
From our literature review, no study has been carried out on the prediction of potentially toxic elements (PTEs) in soil by predicting the pollution load index (PLI).Additionally, soil quality zonation based on the combination of numerical pollution indices and health risk assessment indices has not been documented in any literature.Moreover, no study in Nigeria has attempted to integrate both health risk and numerical indices for a composite soil quality zonation assessment.Therefore, this study aimed to carry out a holistic assessment of the distribution of PTEs, and their associated ecological and health risks as well as to predict the most influential elements impacting the soil quality in the Tashan-Jatau metallogenic province in northwestern Nigeria.The objectives of the study include (1) to assess the distribution and ecological risks of PTEs in the mine province using GIS and multiple numerical pollution indices; (2) to evaluate the potential health risk implications from toxic metal exposure on different population sizes using several composite health risk assessment models (ADD der , ADD inh, ADD ing, HQ, HI, and LCR); (3) to evaluate the most influential toxic element impacting the soil quality in the area using the multilayer perceptron artificial neural network (MLP-NN) sensitivity analysis.It is anticipated that the findings of this research will serve as a benchmark for geochemical and health risk evaluation in Nigeria and other areas of the globe, allowing for better environmental management and sustainability.

Study area
Location, climate, topography, and relief The area under study sits within the middle belt region of Nigeria and lies within longitude 6°0′ 0″E and 6°12′0″E, and latitude 10°0′0″N and 10°8′0″N.The area is defined topographically by undulating high lands and occasional lowlands, with the maximum elevation reaching 300.4 m and lowlands falling within an elevation range of 177.9 m (Fig. 1a).Two major seasons characterize the study area.A dry season that lasts from November to April and is associated by the northeast trade winds with harmattan spells.Maximum temperatures in the dry season range between 30 and 36 °C and are recorded between December and mid-January.The rainy season lasts from April to November and is marked by extreme humidity and high temperatures.The typical annual rainfall is 150-250 mm, with maximum temperature occurring during July and September.The area is characterized by vegetation typical of the guinea savannah, which is distinguished by shrubs and very tall grasses, as well as scattered economic trees of various species.Aside from artisanal mining, the inhabitants of this area also engage in animal husbandry (e.g., nomadic cattle rearing) and subsistence agricultural activities (e.g., cultivation of perennial crops such as yam, rice, millet, guinea corn, cowpea, and groundnuts).

Geology of study area
Geologically, the region under investigation is located at the center of central Nigeria's Basement Complex.The main lithostratigraphic framework of the region is composed of metasediments and meta-igneous rocks that have recently experienced polyphase deformation and metamorphism.Granitic rocks of the Pan-African age are known to be intruded into the rocks of the study area (Oluwakayode et al., 2021;Omang et al., 2022).Porphyritic granites, undifferentiated schist with phyllites and gneiss, migmatites, granitic gneiss, and Masonite are the five lithostratigraphic units identified in this area (Fig. 1b).Within the central portion of the study area, the undifferentiated schist (with phyllites and gneiss) occurs as a flat-lying narrow southwest-northeast trending belt, with the gneiss appearing as small suites on the northern and southern flanks and in contact with the granite (Okon et. al. 2022;Omang et al. 2022).The feldspathic-rich pegmatite has an average width of 65 m and a length of 100 m to the east.This pegmatite is the main rock unit in the study area that hosts tourmaline, whereas schists and phyllites, which are mostly found in the southwest-northeast parts of the area, are the main hosts for gold mineralization (Omang et al. 2023).The area is dominated by granitic rocks of different textures and compositions.

Sample collection and analyses
With the aid of a hand trowel, fifty (50) homogenized surface soil samples were systematically collected in duplicates from trenches, hand-dug pits, and excavated areas where artisanal and semi-mechanized mining activities are currently taking place.A sample density of one sample per 50 m-1 km away from the mining site was used for sampling.The sampling was done keeping in mind the fact that local artisanal miners may be vulnerable to toxic elements by inhalation, ingestion, or skin contact.Moreso, children in the surrounding region may be exposed to potentially toxic elements (PTEs) through ingestion and dermal contact through their playful habits.Additionally, crop plants growing in the surroundings may absorb these toxic elements through translocation and bioaccumulation processes within the soil-root system.
Each sample was collected at a weight of approximately 3 kg.The samples were dried using a twostage drying technique.The samples were air-dried at room temperature to prepare them for sieving.Thereafter, the sieved samples were then air-dried in the laboratory to remove any remaining moisture before being oven-dried at a temperature of about 50 °C overnight.The dried samples were disaggregated and sieved in 2-mm and 63-m stainless steel sieves.The fines from the receiving pan were then scooped and placed in clean, self-seal polythene bags and appropriately labeled for geochemical analysis.Before the analysis, the sieved material was dried overnight in an oven at a temperature of 100 °C.This was done to get rid of any dampness.

Geochemical analysis
Geochemical analysis was carried out using the inductively coupled plasma mass spectrometry (ICP-MS) method to determine the major oxides and total or near total concentrations of trace elements in soils.Soil samples were sieved through 10 mesh (2 mm) to 200 mesh (80 mm), quartered, and crushed in a porcelain mortar.They were then re-homogenized and placed in sealed plastic bags for chemical analysis.0.25 g of each powdered sample was digested by open vessel method using HCLO 4 , HNO, and HF in a Teflon beaker placed on a hotplate.The mixture was then refluxed at a temperature of 90-100 °C for 1 h and then evaporated to dryness at a temperature ranging between 180 and 190 °C.The residue was leached with 5 mL of HCl.ICP-MS was then used to determine trace element concentrations in soil samples at an approved laboratory (Nigerian Geological Survey Agency, Kaduna).The accuracy and precision of the analyses were evaluated by introducing two blank samples and two sample duplicates.The quality assurance and quality control (QA/QC) protocol was followed, and duplicate samples were evaluated at random.The device was calibrated both before and during the analysis to ensure accuracy.All laboratory analyses were conducted following the American Public Health Association (APHA, 2017) standard procedures.
Quantitative assessment of the pollution level of trace elements in soil

Pollution load index
The pollution load index (PLI), introduced by Tomlinson et al. (1980), was used to evaluate the degree of multiple elemental contaminations (enrichment) in soil.Thirteen trace elements (Hg, Cu, Cr, As, Zn, Sr, Nb, Mo, Ba, La, Ce, Pb, and Zr) were selected for the computation of PLI using the relation expressed in Eq. 1 where CF represents the contamination factor of each examined element and n represents the total number of analyzed elements in the soil.

Ecological risk assessment
The ecological risk index ((ERI) can be used to evaluate the probability of toxic elements in the ecosystem posing environmental hazards due to the long-term exposure to contaminants.The heavy metal contamination factor (CF) and the toxic response factor (Tr i ) are combined to calculate ERI (Adimalla et al., 2018).This method not only assesses heavy metal levels in the soil, but it also takes into consideration its ecological and biological impacts from toxicity and assesses contamination through the use of comparable and equivalent property index grading methods.The possible ecological risk coefficient (Eri) of a particular element and the environmental risk index (ERI) of each sample based on heavy metal concentration were evaluated in this study.As shown in Eq. 2, this was accomplished by first calculating the contamination index (Cf i ) of each element by dividing its quantity in the environment (Cn i ) by its background value (Bb i ).
In this study, the background value of each element served as the control.The ecological risk index was then calculated by multiplying the pollution index (Cf i ) of each element by the toxic response factor (Tr i ), as shown in Eq. 3 (Hakanson, 1980).
where Tr i indicates the toxic response factor; Cf i is the contamination index of a particular element; Cn i represents the concentration of a particular element in the environment; Bb i depicts the background concentration (Hakanson 1980;Adimala et al. 2018).

Evaluation of human health risks
The primary occupation of the residents of the study region is animal husbandry and subsistence agriculture (e.g., cultivation of perennial crops including yam, rice, millet, guinea corn, cowpea, and groundnuts).The mine spoils are disposed of downslope, adjacent to surrounding farmlands, rivers, and other water bodies.Additionally, the local miners are at risk of hazardous element ingestion, inhalation (from dust), and dermal contact during mining activities.The children are not exempted either because they might be exposed to harmful elements directly through ingestion or indirectly through skin contact and inhalation while playing.This may result in the exposure of the inhabitants to several health issues.This can also be evident through the feeding chain; as consumers of fruits grown in farmlands close to mine sites may be in danger of metal contamination (through the consumption of vegetables, fruits, and crops).Animals like cattle can also be subjected to hazardous metal ingestion, through their feeding habit; thereby creating a plant-animal-human chain of toxic element poisoning (Nganje et al., 2010;Omeka & Igwe, 2021).Hence, to ensure proper environmental protection and public health safety, it is critical to conduct an urgent assessment of the potential health risks of humans to contaminant exposure using different exposure pathways (Jahromi et al., 2020;Lü et al., 2018; (2) CI = Cn i ∕Bb i (3) ERI = Cf i * Tr i Pavilonis et al., 2017).This will entail the determination of possible exposure routes to hazards, the extent of human exposure to hazards, the likelihood of the effect of the hazard on the health of inhabitants, and the corresponding risk levels from exposure (US-EPA 2017).In the present study, two key probabilistic risk criteria-the non-carcinogenic and carcinogenic risk indices-were taken into account in the health risk assessment.The non-carcinogenic risk, takes into account the likelihood of exposure of contaminants to non-carcinogenic elements, while the carcinogenic risk assessment involves the human exposure risks to carcinogenic elements (US-EPA 2017; Omeka et al., 2022aOmeka et al., , 2022b)).For this study, the US-EPA (2017) risk assessment criteria were considered, by considering two human population sizes-children and adults-and three exposure pathways (ingestion, dermal contact, and inhalation).

Evaluation of daily dosage exposure
The average daily dose exposure was evaluated using three sub-indices: ADD derm , ADD ing , and ADD inh, which represent the dermal contact, inhalation, and ingestion routes of contamination for both adults and children population exposure for all trace metals examined.This was done following the US-EPA (2017) approved standard (Eqs.4 1.

Evaluation of non-carcinogenic health risk exposure
The health hazard index (HI) and health hazard quotients (HQ) were computed using each of the analyzed heavy metals to evaluate the chronic noncarcinogenic risks associated with the inhalation, ingestion, and dermal contact from soil particulates in the mine area for the different age groups (children and adults), as shown in Eqs.7-8 (5) where ADD (inh, derm, ing) refers to the average daily dose for inhalation, dermal contact, and ingestion, HQ and HI represent the hazard quotient and hazard index, respectively (Jia et al., 2018;US-EPA 2017).
The hazard quotient (HQ) was calculated by multiplying the ADD for inhalation, dermal contact, and ingestion for both children and adults.The hazard index (HI) was calculated by dividing the sum of the ADD by the reference dose (RfD) of each selected heavy metal.The reference dose (RfD) of each selected heavy metal was given as Zn (0.3), Sr (0.0015), Cr (0.003), Cu (0.04), Ba (0.07), and Pb (0.0035).For children and adults, the HI and HQ based on inhalation (inh), ingestion (ing), as well as dermal contact (derm), were determined.Based on the US-EPA (2017) classification criteria, the noncarcinogenic hazard is classified as insignificant (if HI = 0.1), low (if HI = 0.1-1), medium (if HI = 1-4), and high (if HI > 4).

Evaluation of carcinogenic human health risk exposure
Four carcinogens (Pb, As, Sr, and Cr) were selected for the present study.These elements were chosen due to their significant carcinogenic potential in humans.The carcinogenic risk evaluation was carried out for each element by calculating the cancer risk (CR) presented by each exposure pathway for a given sample point by using the relationship in Eq. 9.For each carcinogen, the CR was calculated by multiplying the hazard quotient (HQ) by the cancer slope factor (CSF) (Eq.9).The CSF values for As, Cr, Pb, Sr, and Cd were given as 1.5, 0.5, 0.0085, and 0.0015, respectively, following the classification criteria of International Agency for Research on Cancer (IARC, 2011) and the integrated risk information system (US-EPA IRIS 2011).
The lifetime cancer risk (LCR) was also calculated.LCR is defined as the probability that an individual from a specific group (adult or child) will develop cancer as a result of lifelong exposure (52 years) to a specific carcinogen.As shown in Eq. 9, the LCR was ( 7) calculated by integrating the CR values for the three exposure routes (dermal contact, ingestion, and inhalation).According to the US-EPA ( 2017), the acceptable number for cancer risk and tolerable lifelong cancer risk is within the range of 1 × 10 -6 to 1 × 10 -4 .Implying that on average, 1 in 1,000,000 or 10,000 of the population will develop cancer as a consequence of contact with a specific carcinogen.
where CSF represents the cancer slope factor and LCR indicates the lifetime cancer risk for each carcinogen.For each of the carcinogens, the CSF was given as follows: 0.0085 (Pb), 6.3 (Cd).CR was determined for children and adults depending on inhalation (inh), ingestion (ing), and dermal exposure (derm) (IARC, 2011; IRIS 2011; US-EPA 2017).

Modeling of artificial neural networks
Artificial neural networks (ANNs) sensitivity analysis was also integrated into this study to predict the pollution load of toxic elements in soils within and away from the mine area.Given the soil system's intrinsic vulnerability to pollution and the complexities in contaminant movement in the soil phase in terms of fate, speciation, and bioavailability, using ANNs will result in more distinct and precise results in soil pollution assessment, predicting, and management of contaminated sites.Unlike other multivariate statistical models, ANN is a more potent machine learning model that can predict linear and nonlinear associations in both input variables.The architectural structure is made up of linked networks that mimic the human neural system.These distinguishing features allow the ANN to understand, train, comprehend/process, and show relevant results from data collection.An ANN model's architectural framework is made up of three fundamental layers: input, hidden, and output.Using complex mathematical functions and knowledge of the different underlying patterns associated with a dataset, the multiple layers can join to process and create useful responses as a function of the input data (Kouadri & Samir, 2021;Omeka et al., 2023).Errors may occur during a normal data input process; however, ANNs may mitigate this flaw due to their high accuracy in quantitative analysis.
In the present study, before using ANN to forecast the pollution load index (PLI), Pearson's correlation and unrotated principal component analysis (PCA) were used to quantify the connection between the analyzed soil parameters and the PLI.This was done to establish the linearity and nonlinearity between the analyzed variables.Additionally, the statistical assessment was integral for the monitoring of datasets for ANN prediction during the training phase.
Hence, in this study, four feedforward backpropagation ANN models (ANN1, ANN2, ANN3, and ANN4) were generated to predict the pollution load index (PLI) of contaminants in soils, within and away from the mine area.Thirteen analyzed trace elements from the soil (Hg, Cr, As, Sr, Nb, La, Ba, Y, Cu, Zn, Mo, Ce, Pb) were used as input variables (Table 2).The multilayer perceptron artificial neural network (MLP-ANN) with the hyperbolic tangent as the input layer activation function was used in the ANN modeling.The normalized method was used to rescale the variables-the number of hidden layers was set to one (1)-and the activation function of the hidden layer was calculated by using hyperbolic tangent with both the number of units automatically derived.The dependent variables were rescaled with a 0.02 correction value using the modified normalization criteria.Batch training was used to train the ANNs, while the scaled conjugate gradient (SCG) method was selected for the optimization of the ANNs.
For ANN1 and ANN3, data optimization was done using a scaled conjugate gradient algorithm, while the gradient decent optimization algorithm was used for ANN2 and ANN4.For the input layer, the hyperbolic activation function was used for ANN1 and ANN2 activation, while the identity activation function was used for ANN3 and ANN4.To develop an ideal model, the training dataset partition was set to 70% while the testing dataset was set to 30%.The best models for each prediction scenario were then picked by taking into account those with high coefficient of determination (R 2 ) ratings and very low modeling errors.

ANN model validation using statistical methods
Because of the usual difference between both the predictor variables and the actual dataset (raw scores), evaluating the ANN model's performance and reliability is critical (Ray et al., 2020).The performance of the model will be crucial in determining and choosing the most precise activation function and optimization algorithm (Ray et al., 2020).As a consequence, as shown in Eqs.11-14, statistical metrics such as relative error (RE), coefficient of determination (R 2 ), sum of square error (SOSE), as well as residual error (RE), were been applied to validate the ANN model.
where n is the number of observations.The residual errors are commonly used in the ANN model to represent the differences between the real or measured variables and the predicted variables (presented graphically as residual plots).As a consequence, the × and y axes of the plot are indicative of the model accuracy, with numbers closer to zero showing greater model accuracy.On the other hand, the accuracy of the model is considered poor if they are larger positive or negative values on the y-axis.The sum of squared errors (SOSE) measures the magnitude of model prediction errors and is frequently calculated as the average of the squared differences between the actual and predicted variables (usually represented as a positive integer).In general, lower SOSE values suggest greater efficiency of the model (Egbueri et al., 2023;Kalantar et al., 2018).The RE,

Results and discussion
Geochemical and quantitative evaluation of soil pollution level Results of trace elements geochemical evaluation and major oxides are presented in Tables 3 and 4, while the pollution index (PLI) results are presented in Table 5.The mean concentration of trace elements varied in the order of Zr > Ba > Cr > Ce > Sr > Zn > Y > Nb > La > Cu > Pb > Au > As > Hg > Mo, while the concentration of major oxides occurred in the order of SiO The extremely high concentration of Cr, Sr, Ba, and Zr in the surface soil reflects their relatively low mobility potential within the soil system (Olade 1976).These groups of metals are also known to form metallo-organic complexes with soil organic matter within the soil system (Nganje et al. 2023).Cr is sourced from the chemical weathering of ultramafic and igneous rocks (Nganje et al. 2023).The major lithostratigraphic framework of the study area is made up of undifferentiated schist, phyllites, gneiss, and Porphyritic granite.The exposure of these rock units through mining can lead to their weathering and possible oxidation; thereby increasing their concentration on the soil surface.The elevated concentration of Ba is due to the presence of high-pressure acidic gneisses complexes which make up the lithostratigraphic framework of the study area.The occurrence of some of the trace elements (e.g., Cu and Zn) in the surface soils can be linked to their peculiarities in  (Tyler, 2004).Surface environmental conditions such as oxidation, leaching, and weathering of mine spoils may have given rise to the occurrence of the oxides (such as K 2 O).Generally, the occurrence of trace elements in the soils can be traced to both anthropogenic and geogenic sources.
Environmental pollution tools such as the pollution load index (PLI) and ecological risk index (ERI) were used to quantitatively assess the pollution levels of PTEs in soil and associated ecological impacts.Results of PLI and ERI are presented in Tables 5 and  6, respectively.The degree of variation in the contamination level of PTEs in the soil can be effectively assessed using the PLI.From the results in Table 5, the PLI varied between 2.760 and 15.416 with a mean value of 7.333.Based on the PLI classification criteria, all the soil sample occurs between the moderate pollution to extremely contaminated category.Except for one sample (TSJ 46), the rest occurred within the extremely contaminated category.
According to the Eri results, ecological risks from PTEs in the soil occurred in the order of Hg > Ba > Cu > Pb.Based on the Hakanson (1980) classification criteria, Hg and Ba exposure in the soil phase will have a higher ecological risk potential (significance) to the environment compared to Cu and Pb.Generally based on the risk level criteria (ERI), low-risk levels were recorded in 13.7% of the soil samples, moderate risk levels were recorded in 50.9% of the soil samples, considerable risk levels were recorded in 15.6% of the soil samples, while 19.6% of the soil samples recorded very high-risk levels (Table 6).These results imply that the area is highly susceptible to potential ecological risks from Hg and Ba contamination from the surface soil phase.Future ecological remediation approaches must focus on providing mitigation measures for the removal of these elements from the soil phase.

Quantifying the relationships between geochemical parameters and PLI
The PLI value and the thirteen geochemical parameters used for the computation of the PLI were subjected to CA and PCA analysis.This was done to quantify the inter-relationships between them in terms of source or origin.Moreover, the analysis was also integral in the selection and validation of the input (predictor) variables and training dataset for the artificial neural (ANN) modeling.The Pearson's correlation matrix and principal components analysis values are presented in Tables 7 and 8, respectively.The results of the correlation analysis showed that the dataset was valid for ANN modeling.This was evident in the observed linearity among the input variables.
Most of the trace elements (e.g., Hg, Cu, Zn, La, Ce) and oxides (e.g., CaO, NaO, and Al 2 O 3 ) showed a significant positive correlation with PLI.This observation implies that an increase in the concentration of these trace elements in the surface soil results in an increase in the pollution load; and their occurrence within the soil phase, attributed to exposure to subsurface environmental conditions through mining weathering, leaching, and oxidation, were responsible   for the occurrence of oxides of CaO, NaO, and Al 2 O 3 .This led to an increase in the pollution load.However, the PLI was observed to have a negative significant correlation between Zr and Y.The negative correlation observed between the two lanthanide elements (Zr and Y) indicates that their occurrence or influence on the pollution load in the surface soil is attributed to differences in behavior at surface environmental conditions and similarity in geochemical properties.Lanthanide elements are known to be highly electronegative and have a comparable ionic radius, and electron valency (Migaszewski & Gałuszka, 2015).
The relationship between the analyzed trace elements and oxides was also compared using principal components analysis (PCA), and the results are presented in Table 8.The PCA analysis followed the Kaiser (1960) normalization criteria where only component classes with eigenvalues ≥ 1 are accepted.To validate the suitability of the datasets for testing, the Kaiser-Meyer-Olkin (KMO) and Bartlett's sphericity indices were applied.Based on the KMO index, the test was considered at an acceptable value of 0.656 at a significance level of 0 (Table 8).Generally, it was observed that PC1 provided more significant insights regarding the inter-relationships between the PLI and geochemical parameters of the soil.Eight principal components were derived accounting for 75.46% of data variance.PC1 accounted for a total variance of 24.5% with a positive significant loading for PLI, Cu, Sr, Y, Nb, Ba, La, Ce, Pb, and Zr.This suggests that the concentration of these trace elements in the surface soil increased the pollution load.PC2 explained 12.178% of the variance with a positive significant loading for Sr and negative significant loading for Y and Mo.The negative significant loading observed for the two transition metals (Y and Mo) implies that their occurrence in the soil was a result of similarity in their geochemical characteristics.Transition metals have a strong tendency to form organic-metallic complexes in surface soils.Additionally, due to variable oxidation states, they have the propensity to increase the oxidation of metals as a response to surface environmental conditions (Omeka et al., 2022b).PC3 with a 9.618% variance showed a significant positive loading for Hg, Fe 2 O 3 , Al 2 O 3 , and PLI, while a negative significant loading was observed for SiO 2 and MgO.The positive significant loading between PLI, Fe 2 O 3, and Al 2 O 3 suggests that the elements that make up these oxides originated due to surface environmental conditions such as weathering, erosion, and leaching of rocks of ultramafic origin-which constitutes the underlying geology of the area.The negative loading observed for SiO 2 and MgO suggests that weathering of quartz minerals had no significant contribution to the pollution load in the soil.

Artificial neural network modeling
The summarized results of the model validation metrics for the prediction of PLI are presented in Table 9. Various optimization algorithms and activation functions were applied in the estimation of the PLI in the analyzed trace elements (Table 9).This was carried out to identify the suitability of the selected algorithm in predicting the PLI.In the present study, ANN1 and ANN3 were optimized using the scaled conjugate gradient algorithm, while ANN2 and ANN4 were optimized using the gradient descent algorithm.However, the output layers of the ANN1 and ANN2 were activated using the Hyperbolic tangent function while those for the ANN3 and ANN4 were activated using identity activation function.Both algorithms showed good performance accuracy in the prediction of PLI. Figure 2a-d represents the residual error plots and R-squared (R 2 ) values for ANN1 and ANN2, while Fig. 3a-d represents those for ANN3 and ANN4, respectively.The R 2 values for ANN1 and ANN2 occurred at 0.813 and 0.848, respectively, indicating that ANN2 performed marginally better than ANN1.An observation of the residual error plots (Fig. 2b, d) also agrees that ANN2 outperformed ANN1.Additionally, lower modeling errors were observed from the results of RE and SOSE, further affirming the performance of ANN2 over ANN1.This variation in performance between the two models could be attributed to variations in output activation functions and optimization algorithms (Egbueri et al., 2023;Omeka et al., 2023).
Different sets of ANN algorithms were used to predict PLI using ANN3 and ANN4 (Table 9).Both ANN3 and ANN4 were generated using the identity activation function as the output activation function.Conversely, ANN3 was optimized using the scaled conjugate optimizer, while ANN4 was optimized using the gradient descent optimizer.Based on their R 2 values, ANN3 had a better performance accuracy than ANN4 with their respective R 2 values occurring at 0.810 and 0.787, respectively.Results of the residual parity plots (Fig. 3b, d) seem to agree with that of R 2 values in Fig. 3a, c.Moreover, the results from SOSE and RE strongly affirm that ANN3 performed marginally better than ANN4 as lower modeling errors were observed for it compared to ANN4.
It was observed that the gradient decent optimization algorithm (used for generating the ANN2 and ANN4) demonstrated a better performance accuracy in predicting PLI compared to the scaled conjugate optimizer (used for generating ANN1 and ANN3).Conversely, the identity activation function (used in producing ANN3 and ANN4) appears to demonstrate lower modeling errors compared to the hyperbolic tangent activation function (used in generating ANN1 and ANN2) based on the results of the RE and SOSE.This variation in performance accuracy can be attributed to differences in the architecture of the different The sensitivity analysis of all input variables was performed to determine the contribution of each predictor variable to the architectural framework of the ANN model.In the present study, the sensitivity analysis was carried out to determine the overall percentage influence or contribution of each trace element in the overall ANN model performance in predicting PLI.Accordingly, only predictor variables with normalized importance of > 50% were considered sensitive in the ANN model.Results of the sensitivity analysis were generated and represented using bar charts (Fig. 4).Based on the sensitivity analysis, four input parameters (Ce, Sr, Hg, and Zn) were importantly significant to the model sensitivity for ANN1.For ANN 2, seven input parameters (Sr, Hg, Nb, As, Ba, Cr, and Cu) were important and significant to the development of the model.Conversely, for ANN3, only three input variables (Sr, Ba, Zr) showed significant importance to the model sensitivity.Five predictor variables (Zn, As, Hg, Sr, and Mo) showed a significant influence on the model sensitivity for ANN4.
Generally, from the results of the sensitivity analysis, it is observed that Hg, Sr, As, and Zn had the highest contribution to the model sensitivity.It can therefore be deduced that they are the major contributors to the pollution load in the soils within the Tejan-Jatau mine area.Hence, it is recommended that remediation techniques geared toward PTEs removal from the soil within the area should be focused on the removal of these elements; as they appear to be the major contributors to the soil pollution and ecological risk status.This will drastically reduce the high financial constraints that may otherwise be expended on continuous soil pollution remediation.This model has proven very efficient and can be used in other parts of the world in soil quality management and the remediation of contaminated sites.It can also serve as a robust tool for environmental pollution assessment, especially in developing countries saddled with limited datasets due to inadequate funding.

Daily dose exposure assessment
Daily dose exposure assessment was carried out for three exposure pathways: ingestion (ADD ing ), dermal contact (ADD der ), and inhalation (ADD inh ).Seven potentially toxic elements (PTEs) (Zn, As, Sr, Cr, Cu, Ba, and Pb) were taken into account for the daily dose exposure assessment.Table 10 shows that the risk ranking for exposure pathways declined in the following order: ADD der > ADD ing > ADD inh for the adult population size, and ADD ing > ADD der > ADD inh for the children population size.These results imply that there is a very significant risk of skin exposure to these pollutants for local artisanal miners.Results also show that children within the mine area can be exposed more to the risk of toxic element ingestion through their playing habits.Moreover, given their smaller body weights compared to adults, the children population is generally at greater risk of toxic element exposure.

Non-carcinogenic health risk assessment
The non-carcinogenic risk assessment was performed using the hazard quotient (HQ) and hazard index (HI) coefficients.The overall HQ results are shown in Tables S1-S6 (supplementary material), whereas the HI results are shown in Table 11.The three risk exposure pathways for both children and adults were observed to occur in the following order: ingestion pathway (As > Sr > Cr > Pb > Ba > Cu > Zn); dermal contact pathway (As > Sr > Cr > Pb > Cu > Ba > Zn ); inhalation pathway (As > Sr > Cr > Pb > Ba > Cu > Zn).Meanwhile, the HQs for children and adults occurred as HQ ing > HQ der > HQ inh, based on the different exposure routes (Tables S1-62).Based on the HI risk classification criteria (US-EPA 2017), it is observed that all of the soil samples in the area expose people near the mine to very high chronic risks (HI ≥ 4) through ingestion and dermal contact; with higher risk levels found in the population of children than in those of adults.Based on inhalation, there was a very low cancer risk (≥ 0.1 HI < 1) for both populations.These findings appear to be consistent with other research on the health risks of toxic elements in the soil in mining sites conducted in the southeastern and southwestern parts of Nigeria (Omeka et al., 2022a).The findings of this study suggest that chronic risk levels for residents living near the Tashan-Jatau mine are higher through skin contact and ingestion than through inhalation of soil particles from the mining area.

Carcinogenic health risk assessment
The lifetime cancer risk (LCR) and cancer risk (CR) of four carcinogenic trace elements (Pb, As, Sr, and Cr) were calculated.According to the International Agency for Research on Cancer (IARC) and the Integrated Risk Information System (IRIS), these components were taken into consideration because they have   12.However, the danger of cancer from inhalation (Cr inh ) was exceedingly minimal (1.0E−04).
Based on their mean values, the cancer risks (CR) for all the carcinogens for all the exposure pathways declined for both the children and adult population sizes in the following order: As > Cr > Pb > Sr. Children may be more susceptible to cancer hazards than adults due to greater CR rates that appeared to have been observed in the children population.This has been explained by the fact that children have lower body weights than adults in terms of risk exposure (US-EPA 2017; Omeka et al., 2022a).In comparison with the CR, the LCR demonstrated a similar pattern of carcinogen occurrence: When compared to other carcinogens, arsenic (As) had the highest LCR values for all population sizes, with children having the highest lifetime exposure vulnerability to all carcinogens (Table 12).This implies that the two main toxins (carcinogens) causing the greatest health hazards to the inhabitants of the mining district are arsenic and chromium.
Conversely, it is observed from the results that ingestion and dermal contact appear to be the most likely common exposure pathways to toxic components in the Tashan-Jatau mine district.Since the major occupation of the area is agriculture (subsistence farming and animal husbandry) and mining, residents may be exposed to hazardous elements through crop consumption, land cultivation, and mining activities, creating a soil-plant-animalhuman toxic element exposure route.The lesser body weights of children have been related to the higher risk level of toxic element exposure compared to the adult population as observed in this location.The findings of this study are consistent with those found in other parts of the world (Jahromi et al., 2020;Jia et al., 2018).

GIS-based human health risk exposure assessment
Spatial distribution risk maps were generated using GIS.This was done to have a visual appraisal of the distribution of the carcinogenic risk levels of PTEs for different exposure routes and population sizes.A visual appraisal of the risk and hazard exposure levels within the area will be integral for informed decision-making toward adequate remediation and environmental protection and sustainability.To this end, spatial maps based on the cancer risk (CR) from the exposure of the most toxic carcinogens (As and Cr) were generated for both children and adults based on the ingestion and dermal contact exposure pathways (Figs. 5 and 6).
For the adult population size, cancer risk levels for arsenic (As) ingestion appear to increase toward the north-central parts of the study area (around Gindan Danko and Inga) and decrease toward the southwestern and northeastern fringes of the study area (around Tegina, Gende, and Ussa communities).However, medium carcinogenic risks appear to occur around Manga, Rafin Dandoko, and Babangona (Fig. 5).This   Pockets of high-risk levels are also observed to increase toward the northern fringes of the study area; around Babangona (Fig. 5).The implication of this is that children around these communities are at very high risk from arsenic risk through their playing habits.Chromium (Cr) risk exposure for the adult population is observed to vary widely toward the northeastern fringes of the study area (around Gindan Danko and Tegina).Risk levels are also observed to increase from the southwestern parts (around Ussa) toward the southeastern fringes (around Rafin Dandoko).Areas such as Gende, Manga, Inga, and Babangona record a very low risk of Cr exposure to the adult population within the vicinity of the mine.However, for the children population, risk levels from Cr exposure seem to increase toward the western parts of the area (around Manga) and increase steadily toward the southeastern fringes (around Rafin Dandoko).
Results from the spatial risk maps have shown that risk levels from ingestion exposure to the two priority carcinogens (Cr and As) in the area are highly variable and seem to occur almost across all the areas in the vicinity of the mine.This calls for urgent attention to remediation of the contaminated sites to protect the health of the inhabitants.

Conclusions
Joint geochemical, numerical, spatiotemporal, and machine learning data have been used to create a database for the Tejan-Jatau mine area, in Northwestern Nigeria.In the study, numerical data were generated to assess the soil pollution load level and ecological and health risks of potentially toxic elements (PTEs) in soils.The study also involved the use of artificial neural network machine learning algorithms to predict the pollution load of toxic elements and to forecast the most influential toxic elements impacting soil quality.
Geochemical data indicate the mean concentration of trace elements to vary in the order of Zr > Ba > C r > Ce > Sr > Zn > Y > Nb > La > Cu > Pb > As > Hg > Mo, while the concentration of major oxides occurred in the order of SiO 2 > Fe 2 O 3 > A l 2 O 3 > K 2 O > TiO 2 > Ca O > SO 2 > MgO > Cl > P 2 O 5 > NaO > MnO.Aside s f rom mining, the occurrence of trace el eme nts and oxides are attributed to their tenden cy to form metal-organic complexes with soil organic matter within the so il sys t em and to surface environment al con ditions such as leaching a nd wea thering of the bedrock ge olo gy.Pollution load index (PLI) and ecological risk index (ERI) data revealed that the soil is Generally, based on the ERI criteria, low-risk levels were recorded in 13.7% of the soil samples; moderate risk levels were recorded in 50.9% of the soil samples; considerable risk levels were recorded in 15.6% of the soil samples, while 19.6% of the soil samples recorded very high-risk levels.A relationship between PLI and geochemical parameters was established using correlation and principal component analysis.PLI showed significant positive linearity (p ≥ 0.5), with the trace elements and oxides; further confirming that surface environmental conditions (e.g., leaching and weathering of the bedrock geology), geochemical variation (e.g., organo-metallic complexation) and mining, were the chief precursors for the occurrence of PTEs within the soil system.
The multilayer perceptron artificial neural network (MLP-ANN) showed high efficiency in the prediction of soil quality.The coefficient of determination (R 2 ) for ANN1 and ANN2 occurred at 0.813 and 0.848, while that for ANN3 and ANN4 occurred at 0.810 and 0.787, respectively.Low modeling errors were observed for the different validation metrics such as the sum of square errors (SOSE) and relative errors (RE), further affirming the efficacy of the model in soil quality prediction.The sensitivity analysis of all input variables was performed to determine the contribution of each predictor variable on the architectural framework in the overall ANN model performance in predicting PLI.Based on the sensitivity analysis, Hg, Sr, Zn, Ba, As, and Zr showed the greatest influence on the soil quality in the area.
The hazard quotients (HQs) and hazard index (HI) of all the PTEs were greater than one.This suggests that the inhabitants (especially children) are generally more vulnerable to risks due to toxic elements ingestion than through dermal contact and inhalation.As and Cr recorded higher cancer risks and lifetime cancer risk levels (> 1.0E − 04) from ingestion than dermal contact.For the adult population size, cancer risk levels for As ingestion appear to increase toward the north-central parts of the study area, while for the children population, cancer risks for As ingestion increases from the central region toward the southeastern parts.Conversely, Cr risk exposure for the adult population was observed to vary widely toward the northeastern fringes of the study area, while for the children population, risk levels from Cr exposure seem to increase toward the western and southeastern parts of the study area.
This study is the first in Nigeria to combine numerical, machine learning, GIS, and geochemical data for a holistic soil quality assessment.Hence, from the findings of this study, it is recommended that remediation techniques geared toward PTEs removal from the soil within the mine area should be focused on the removal of Hg, Sr, Zn, Ba, As, and Zr; as they appear to be the major contributors to the soil pollution status.This will drastically reduce the high financial constraints that may otherwise be expended on continuous site remediation.This model has proven very efficient and can be used in other parts of the world in soil quality management and the remediation of contaminated sites.It can also serve as a robust tool for environmental pollution assessment, especially in developing countries saddled with limited datasets due to inadequate funding.

Fig. 1
Fig. 1 Map showing the sample location and geology of the study area

Fig. 2 R
Fig. 2 R 2 values and residual error plots for a-b ANN1 and ANN2; c-d ANN3 and ANN 4

Fig. 3 R
Fig. 3 R 2 values and residual error plots for a-b ANN3 and ANN4; c-d ANN3 and ANN 4

Fig. 4
Fig. 4 Charts showing the relative importance of the input variables for a ANN1, b ANN2, c ANN3, and d ANN4 architectural framework

Fig. 5
Fig. 5 Spatial maps showing arsenic risk exposure for a adult and b children

Fig. 6
Fig. 6 Spatial maps showing chromium risk exposure for a adult and b children

Table 1
Human health risk assessment parameters

Table 2
Major methodological information utilized for the ANN modeling

Table 3
Detailed results of analyzed tra ce ele men ts in the so il The high concentration of oxides such as SiO 2 , Fe 2 O 3, and Al 2 O 3 shows the prevalence of clay and quartz minerals from leaching and weathering activities from the bedrock geology

Table 4
Results of analyzed soil oxides Oxides

Table 7
Correlation analysis showing the relationship between trace elements, oxides, and PLI **Bold correlation-significant at ≥ 0.5

Table 9
Summary of metrics used for ANN validation