Temporal and Spatial Water Quality Assessment of the Geumho River, Korea, Using Multivariate Statistics and Water Quality Indices

The Geumho River in South Korea passes through a metropolitan area with a high population density and multiple industrial complexes; therefore, the water quality of this river is of significance for human health and economic activities. This study assesses the water quality of the Geumho River to inform river water quality management and improve pollution control using multivariate statistics and the Korean Water Quality Index (KWQI). Principal component and factor analyses identified factors related to organic pollutants and metabolism (principal factor 1) and phosphorus and fecal coliform content (principal factor 2). Based on the results of the cluster analysis, it was classified into four groups in time and three groups in space. Six temporal variables and seven spatial variables were extracted from discriminant analysis results; the most important water quality variables were high during the spring and summer seasons and in the midstream and downstream regions. Temporally, the KWQI was the highest in winter and the lowest in spring; spatially, the KWQI was the highest in the upstream and the lowest in the midstream sections. These results indicate that to improve effectiveness, water management interventions in the Geumho River should focus on the urban midstream section and spring season.


Introduction
The water quality of a river is affected by pollution (e.g., from nonpoint sources and non-biodegradable organic matter) generated from natural processes, urbanization, and regional development often in association with agricultural and industrial activities [1][2][3] . When the self-puri cation capacity of a river is exceeded due to the in ow of such pollution sources, environmental problems occur including poor water quality [4,5] . In the past, organic matter management through biochemical oxygen demand (BOD) regulation was possible because most pollutants were predominantly biodegradable. However, with advances in industry and rapid urbanization, the use and release of inorganic chemicals and non-biodegradable organic matter have increased [6,7] . As a result, the in ow of excessive nutrients to rivers can cause serious environmental problems including algal blooms, eutrophication, and large-scale sh kills. To effectively manage surface water quality, it is important to control the pollution sources affecting rivers by collecting reliable data, assessing temporal and spatial water quality dynamics, and analyzing the causes of water pollution [8][9][10] .
Various statistical analysis techniques have been used to monitor water quality in rivers including multivariate statistical techniques (MSTs), such as principal component analysis (PCA), factor analysis (FA), cluster analysis (CA), discriminant analysis (DA), and analysis of variance (ANOVA) [11][12][13][14] . These methods inform water resource management by enabling the assessment of key factors affecting water quality and their temporal and spatial dynamics. Indeed, worldwide, many studies have sought to assess river water quality patterns and dynamics using various statistical techniques, and propose measures to rapidly solve pollution and associated water-quality problems [3,15−17] .
With increasing industrialization and urbanization worldwide, the management of rivers in metropolitan areas is becoming more important [18][19][20] . In the case of the Geumho River in South Korea, the upstream section is relatively clean while nonpoint pollution sources discharge into the river under certain rainfall conditions. In contrast, there are multiple pollution sources along the midstream section, which ows through a metropolitan city; yet, current water quality management in the Geumho River Basin focuses on BOD and total phosphorous (TP) concentrations only in the downstream section.
In this study, the water quality characteristics of the Geumho River are assessed using both MSTs and the Korean Water Quality Index (KWQI), the combination of which is judged to be an effective method for evaluating water quality characteristics relative to their individual use. The main aim of the study is to evaluate the spatial and temporal characteristics of water quality in the river and determine the main contributing factors. The analysis process of this study is illustrated in Fig. 1. In doing this, this study provides a useful tool for identifying the causes and dynamics of pollutants found in rivers passing through other metropolitan areas so that appropriate and targeted water quality control measured can be developed.

PCA and FA
The FA results obtained by conducting PCA and then rotating the result using the varimax method identi ed four VFs with an eigenvalue ≥ 1.0, and the total variance was 72.18% (Table 2). VF1 represented 22.17% of the total variance, and COD and TOC showed a "strong" loading; and BOD, TSS, and Chl-a showed a "moderate" loading. VF2 accounted for 20.45% of the total variance, and DO and TP showed a "strong" loading; and WT, PO 4 -P, and FC showed a "moderate" loading. VF3 represented 18.43% of the total variance, and TN and NH 3 -N showed a "strong" loading; EC and NO 3 -N showed a "moderate" loading. VF4 accounted for 11.13% of the total variance, and pH showed a "strong" loading.  The principal factors (PFs) were expressed in a scatter plot using VF1 and VF2 in Fig. 3, which are factors that signi cantly affect the water quality of the river. BOD, Chl-a, TSS, TOC, and COD were extracted from PF1, while FC, PO 4 -P, WT, and TP were extracted from PF2. As shown in Fig. 3, the major variables were found to be organic pollutants, substances related to metabolism in the river, phosphorus, and FC. As for organic pollutants, large amounts of non-biodegradable organic matter have been discharged due to recent rapid urban growth, industrialization, and intense human activities [7,39] , implying that more stringent regulations for organic pollutants are required. In addition, for urban rivers that pass through metropolitan cities with large populations and high levels of industrialization, the appropriate operation of STFs and WTFs is required given the continuous discharges from certain pollution sources [19,29] .
Water quality assessment Descriptive statistics (Table 3) for the PF1 and PF2 variables of the PCA/FA results are compared with the environmental standards for rivers in Korea in Table 4. Based on the mean values for PF1, BOD was found to be "very good" and "slightly good" at sites S1 to S3, and "moderate" at sites S4 to S6; COD was "moderate" at sites S1 to S5 and "slightly bad" at site S6; TOC was "slightly good" at sites S1 to S3, "moderate" at sites S4 to S5, and "bad" at site S6. For BOD, COD, and TOC, which represent the levels of organic pollutants, water quality grades varied between each site. This is because biodegradable organic materials are currently more strictly regulated in STFs; however, regulation of household sewage in urban areas and the non-biodegradable organic matter generated in industrial complexes remains insu cient. In particular, it is assumed that high COD and TOC concentrations at site S6 re ect the large textile-dyeing complexes along the section of the river between sites S5 and S6, from which untreated non-biodegradable organic matter is discharged.
S1 to S6, monitoring sites; R, range; M, mean; SD, standard deviation; BOD, biochemical oxygen demand; COD, chemical oxygen demand; TOC, total organic carbon; TSS, total suspended solids; Chl-a, chlorophyll-a; TP, total phosphorus; PO 4 -P, phosphate-phosphorus; WT, water temperature; FC, fecal coliform; CFU, colony forming unit. Table 4 Environmental standards for river water quality in Korea [40] . Based on PF2, TP was found to be "very good" at sites S1 and S2 and "slightly good" at sites S3 to S6. TP is strictly managed based on the Korean Government's regulations on the concentrations of STF and WTF e uents [41] . However, continuous management is required along the Namcheon tributary, which ows into the river upstream of site S3, as stables and vineyards are widespread in this area, which can act as nonpoint pollution sources during rainfall events. The PO 4 -P concentrations were similar to those of TP, and WT varied signi cantly over time owing to the strong seasonal effect of the monsoonal climate. FC concentrations were classi ed as "good" at sites S1 to S2 but were very high at sites S3 to S6, exceeding the current environmental standards (1,000 colony-forming unit (CFU)/100 mL < FC). The increase in FC concentrations in the downstream direction likely re ected the introduction of STF and WTF e uents from the metropolitan areas downstream of site S2.
When water quality was assessed by comparing the major variables extracted from the PCA and FA with the current environmental standards, overall water quality was poor in the downstream section of the river relative to the upstream section because of the signi cant in uence of the point pollution sources located in the urban areas, most notably the STFs and WTFs. In addition, the water quality characteristics of the river were found to vary depending on the surrounding land-use patterns.
Temporal and spatial cluster analysis The results of the CA and temporal and spatial classi cation using the water quality data of the monitoring sites (S1 to S6) are shown in Fig. 4. Based on the seasonal CA results (Fig. 4A), March to June was classi ed as "cluster1 (spring)," July to September as "cluster2 (summer)," October to December as "cluster3 (autumn)," and January to February as "cluster4 (winter)." These four distinct seasons re ect the mid-latitude temperate monsoonal climate of the study basin. The average total precipitation of the target watershed over the last ten years was 1,089.1 mm, 54.6% (594.8 mm) of which fell between July and September. Due to this strong temporal rainfall bias, the ow of the river signi cantly varies between seasons, which is also assumed to result in seasonal water quality variations.
Based on the spatial CA results (Fig. 4B), sites S1 to S2 were classi ed as "cluster1 (upstream: US)," sites S3 to S5 as "cluster2 (midstream: MS)," and site S6 as "cluster3 (downstream: DS)." These clusters correspond to the US section with relatively few pollution sources, the MS section with urban STFs, and the DS section with large industrial complexes, WTFs, and greenhouse facilities. These areas are more clearly distinguished from the land-use map of the Geumho River watershed (Fig. 5). Cluster1 corresponds to the area with a high proportion of forests and farmlands; cluster2 corresponds to urban areas with residential, commercial, and industrial landcover types; and cluster 3 includes a complex mix of land uses.
Temporal and spatial DA DA was conducted following the CA according to the pre-classi ed temporal and spatial water quality characteristics. Thus, temporal accuracy was maximized by conducting the DA according to the seasonal classi cation (spring, summer, autumn, and winter). DA was conducted in standard and stepwise modes, and the discriminant function and classi cation matrix constructed for each mode are shown in Tables 5 and 6. In the standard mode, all 15 water quality variables were used, and the average accuracy of the classi ed matrix was as high as 80.2% (Table 6). In the stepwise mode, which reduces the number of variables relative to the standard mode, six water quality variables (WT, BOD, TOC, EC, NH 3 -N, and TP) were used, yielding an accuracy of 78.8% (Table 6). Notably, in the stepwise mode, spring and autumn showed relatively low classi cation accuracies of 75.0% and 74.0%, respectively.  To evaluate water quality according to the temporal classi cation, the six water quality variables identi ed based on the stepwise approach are shown in a box plot in Fig. 6, which con rmed the homogeneity of the groups based on the ANOVA results and the Scheffe Test (p<0.05). WT was highest in summer and tended to uctuate in line with seasonal changes. WT was more variable in spring and autumn than in summer and winter. BOD was highest and more variable in spring and tended to lower in summer and autumn, and increased again during winter. In contrast, TOC concentrations were highest in summer relative to the other seasons. BOD and TOC are representative organic pollution indicators, and the uctuations in their concentrations likely re ect the in uence of increased river ows due to rainfall. In the case of EC, average values did not signi cantly differ between seasons, although there were seasonal differences in EC variability. NH 3 -N was recorded at signi cantly higher concentrations in winter than during the other seasons. This may re ect high NH 3 -N discharges from STFs and WTFs coupled with relatively low denitri cation rates in winter when water temperatures are lower. In many countries, NH 3 -N is managed by setting water quality standards for rivers and the e uents from water treatment facilities [42] , yet in Korea, NH 3 -N is not regulated. This requires further consideration in the context of water quality management in Korea. For example, aquatic ammonia toxicity in rivers increases when the NH 3 -N concentrations rise above 1.0 and pH > 8.0 [43] ; in winter, the concentration of NH 3 -N in the Geumho River was > 1.0 (Fig. 6) and the average pH in the MS section was 8.5 (Fig. 7). These results may be problematic for aquatic life in this river [44] . In addition, in ows containing high concentrations of NH 3 -N have negative impacts on BOD and eutrophication in rivers [42,45] . In the case of TP, average concentrations and ranges were much higher in summer than during the other seasons. This likely re ects the various nonpoint sources that contribute pollutants during the summer ood season as well as untreated e uents that exceed the capacity of the STFs during intense rainfall (i.e., storm over ow).
DA was also conducted based on the CA spatial classi cation (US, MS, and DS). The discriminant function and classi cation matrix of the results following the standard and stepwise modes are shown in Tables 7 and 8, respectively. In the standard mode, all 15 variables were used, and the average accuracy of the classi ed matrix was as high as 97.9% (Table 8). In the stepwise mode, seven variables (pH, BOD, COD, TOC, EC, TN, and TP) were used, yielding a classi cation accuracy of 92.0% (Table 8). Notably, the classi cation accuracy for US (89.6%) was relatively low compared to other classi cation results in the stepwise mode, and the accuracies for MS and DS were 91.7 and 97.9%, respectively.  To evaluate the water quality characteristics according to the spatial classi cation, the seven water quality variables identi ed based on the stepwise analysis are shown in Fig. 7, which con rms the homogeneity of the groups based on the ANOVA results and the Scheffe Test (p<0.05). pH, TN, and TP were high in the MS, which likely re ects nutrient inputs from the six STFs and WTFs located between sites S3 and S6 (Fig. 2). These inputs can stimulate phytoplankton growth, which can decrease oxygen saturation and increase carbon dioxide saturation in water, thereby increasing pH [46] . BOD, COD, TOC, and EC increased toward the DS section of the river compared to the US section, suggesting the in ow of organic pollutants in association with urbanization and industrialization in the DS area (see Fig. 5). TN and TP also showed similar trends, although average concentrations in the MS section were similar to the DS section, and variability was higher in the MS section than in the DS section. These patterns may re ect the high proportion of farmland in the MS section relative to the DS section, which acts as a nonpoint pollution source during certain conditions. For the purpose of creating an urban landscape, the amount of water through river dredging increased. However, the ow of the river decreased and stagnant water formed. This reduction in the self-puri cation capacity of the river has an adverse effect on water quality. In addition, as a result of reductions in the ow during the dry season, e uents from the STFs and WTFs typically make up the majority of instream ow [23] . The treatment facilities in the Geumho River Basin treat both domestic wastewater and industrial e uents, which can contain nonbiodegradable organic matter, and requires ongoing regulation and management [39] .
Water quality assessment using the KWQI The water quality of the river was assessed for each temporally and spatially classi ed group using seven water quality variables applied by the Korean Ministry of Environment (pH, DO, EC, WT, TOC, TN, and TP) (  KWQI evaluation also showed that pollution was greatest during seasons when instream ows were low and in those sections where multiple treatment facilities are located, which is consistent with the results discussed in Section 3.4.

Conclusions
Major water quality variables for the Geumho River were temporally and spatially classi ed and analyzed using various statistical analyses. Among the variables extracted from the PCA and FA, COD, TOC, TP, and TSS exhibited very strong loadings, indicating that non-biodegradable organic matter and nonpoint sources generated during rainfall are the main causes of water quality deterioration in the study basin. Based on the temporal (spring, summer, autumn, and winter) and spatial (US, MS, DS) classi cations, water quality dynamics during the study period were evaluated and countermeasures were identi ed. Based on the KWQI, the overall water quality classi cation for the study river is "good", although this drops to "fair" during some seasons and in some sections. The poorest water quality occurs in spring (average KWQI = 59.2) and in the MS section (average KWQI = 56.4).
Given the critical importance of rivers for human life, including the roles in leisure activities and industrial activities, e cient water quality management is required to minimize negative impacts on human and environmental health. Based on the results obtained in this study, future research and monitoring should target those seasons and river sections identi ed as having the lowest water quality, i.e., the spring season and the MS and DS river sections.

Study watershed
The Geumho River originates in Pohang City and passes through Daegu Metropolitan City via Yeongcheon Dam, Yeongcheon City, and Gyeongsan City (Fig. 2). There are more than 20 main in ow tributaries to the Geumho River including the Shinryeongcheon, Bukancheon, and Cheongtongcheon in the upstream area, and the Omokcheon, Namcheon, Shincheon, Palgeocheon, Dalseocheon, and Ieoncheon in the downstream area. These combine and ow into the Nakdong River. The average annual precipitation in the Geumho River watershed is 1,089 mm, which is signi cantly lower than the national average of 1,159 mm (https://data.kma.go.kr). In particular, water quality deteriorates during the relatively dry season (January to April) due to a lack of instream ow, whereas rainfall is concentrated during the ood season (July to September) [22] .
The upstream section of the river is relatively clean, although some agricultural and livestock nonpoint pollution sources exist. In the midstream and downstream sections, the pollution loads rapidly increase due to the presence of large industrial complexes and dense urban areas [23][24][25] . In addition, in the downstream section of the river, there is concern over the in ow of nonpoint sources during rainfall events from widespread greenhouse farming. Indeed, frequent in ows of high-concentration pollution sources have occurred in the past due to rapid industrialization and urbanization, contributing to a poor water quality status [26,27] . Although water quality has been greatly improved through continuous management, the in ow of various pollutants continues to be a problem. In particular, the industrial complexes within the basin generate large amounts of wastewater treated in large sewage treatment facilities (STFs) and wastewater treatment facilities (WTFs) and subsequently discharged into the river. These e uents introduce a range of pollutants (e.g., non-biodegradable organic matter, nutrients, and heavy metals) into the river as it passes through the urban areas [19,28,29] . Therefore, further water quality management interventions are urgently required to target these pollutants.

Water quality analysis
Water quality monitoring was conducted once a month from 2017 to 2020 at six sites (Fig. 2) in the Geumho River Basin (Fig. 2), data for which were obtained from the Water Environment Information System database (http://water. nier.go.kr). For each of the surface samples collected from each site, 15 water quality variables were measured in accordance with the o cial test methods for water pollution in Korea [30] . Water temperature methods [30] using samples transported from the eld in an icebox. The analysis methods used are listed in Table 1.

Statistical analysis
Multivariate statistical analysis of water quality was conducted using CA, PCA, FA, and DA techniques. The data for the 15 water quality variables (2017 to 2020) for the six monitoring sites were used in these tests. All data were standardized using z-scores to prevent errors resulting from the different measurement units of each variable. For all statistical calculations and gures, SPSS 24.0, XL-STAT 2020, and Arc-GIS 10.5 software programs were used.

PCA and FA
PCA is a statistical technique used to reduce dimensions by extracting eigenvalues and eigenvectors from a covariance matrix using the correlations among variables [13,31] . An eigenvalue represents the magnitude of variance that can be explained by the principal component; an eigenvalue > 1.0 indicates that one principal component can explain one or more variables. FA generates a new component (varifactor, VF) by rotating the PCA result using the maximum variance method to reduce the in uence of the variables of low importance [32,33] . Therefore, eigenvalues ≥ 1.0 were extracted and analyzed here. To assess and verify the suitability of the data for FA, Kaiser-Meyer-Olkin (KMO) and Bartlett's tests were rst conducted [11,15] . In the FA results, loading values > 0.75 were classi ed as "strong" and those between 0.5 and 0.75 were classi ed as "moderate" [34] .

CA
CA is a statistical technique used to classify data by analyzing the differences or similarities among observed values. Hierarchical CA can signi cantly reduce the dimensionality of data by classifying clusters into homogeneous groups. In general, the similarity between samples is measured using the Euclidean distance, and the distance between clusters is measured using Ward's method [12,35,36] . Here, the 15 different characteristics of the study watershed were investigated based on temporal and spatial classi cation. In addition, the results of the CA were used as the pre-classi cation values for the DA.

DA and ANOVA
DA is a statistical technique used to derive a group classi cation discriminant through the explanatory variables of data groups and classify the groups accordingly [10,15] . The technique minimizes classi cation errors when data are classi ed into two or more groups by calculating a linear discriminant function, and predicts and con rms the group of dependent variables using independent variables derived from quantitative data. In this study, DA was performed according to the temporal and spatial classi cations based on the CA results. DA was conducted in standard and stepwise modes, and the results were compared and evaluated. ANOVA compares the variability of two or more groups by comparing the variance within the groups [37] . Here, one-way ANOVA was performed to examine the homogeneity between the stepwise DA groups. A post-hoc analysis was performed using the Scheffe Test to examine the effect of the different temporal and spatial groups on each of the 15 water quality characteristics.

KWQI
The KWQI was calculated and evaluated for each of the temporal and spatial groups of the study watershed, as is utilized by the Ministry of Environment of Korea [38] . The KWQI considers seven variables (pH, DO, EC, WT, TOC, TN, and TP) and was calculated as follows: where F1 is the fraction calculated by dividing the number of water quality variables that violate standards by the number of all measured water quality variables; F2 is the fraction calculated by dividing the total number of standard violations by each water quality variable during the measurement period by the total number of measurements; F3 is the sum of the factors that fractionalized the degree of each water quality variable for the standards. The KWQI ranges from 0 to 100, with a higher value representing cleaner water. In this assessment, KWQI values were divided into the following ve grades: excellent (80-100), good (60-79), fair (40-59), poor (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39), and very poor (0-19).

Declarations
Ethics approval and consent to participate: Not applicable.
Consent for publication: Not applicable.
Availability of data and materials: The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Competing interests: The authors declare that they have no con icts of interest.  Flow chart of data analysis on the Geumho River Locations of water monitoring sites and sewage and wastewater treatment facilities in the target watershed [21] .
Red point, monitoring sites (S1 to S6); purple point, sewage and wastewater treatment facilities (T1 to T9) Dendrograms that show seasonal and spatial clusters. a: seasonal cluster analysis; b: spatial cluster analysis Box plots of the temporal stepwise mode discriminant function. The middle line of each box represents the median value, the lower line represents the rst quartile (25%), the upper line represents the third quartile (75%), the lower bar represents the minimum (median-1.5 × interquartile range; IQR), and the upper bar represents the maximum (median + 1.5 × IQR); the red cross mark indicates the average value. IQR is the third quartile minus the rst quartile, and the letters above the box plot represent the results of the Scheffe Test (p < 0.05). a: water temperature; b: biochemical oxygen demand; c: total organic carbon; d: electrical conductivity; e: ammonianitrogen; f: total phosphorus. S1, spring; S2, summer; S3, autumn; S4, winter Figure 7 Box plots of the spatial stepwise mode discriminant function. The middle line of each box represents the median value, the lower line represents the rst quartile (25%), the upper line represents the third quartile (75%), the lower bar represents the minimum (median-1.5 × interquartile range; IQR), and the upper bar represents the maximum (median + 1.5 × IQR); the red cross mark indicates the average value. IQR is the third quartile minus the rst quartile, and the letters above the box plot represent the results of the Scheffe Test (p < 0.05). a: pH; b: biochemical oxygen demand; c: chemical oxygen demand; d: total organic carbon; e: electrical conductivity; f: total nitrogen; g: total phosphorus. US, upstream; MS, midstream; DS, downstream Seasonal and spatial water quality index (WQI) scores. Four bar graphs in each seasonal or spatial group show WQI scores from 2017 to 2020 from the left to right. The letters on the right represent the water quality grades of the WQI. Excellent, Suitable for hydrophilic activities with clean water; Good, Suitable for hydrophilic activities with good water; Fair, sometimes pollutants may be introduced and affect hydrophilic activity; Poor, Attention of hydrophilic activities due to frequent in ow of pollutants; Very poor, Inadequate for hydrophilic activities with high water pollution level. WQI, water quality index; a, seasonal WQI; b, spatial WQI.