Water quality assessment and pollution source apportionment using multivariate statistical and PMF receptor modeling techniques in a sub-watershed of the upper Yangtze River, Southwest China

Rapid industrial and agricultural development as well as urbanization affect the water environment significantly, especially in sub-watersheds where the contaminants/constituents present in the pollution sources are complex, and the flow is unstable. Water quality assessment and quantitative identification of pollution sources are the primary prerequisites for improving water management and quality. In this work, 168 water samples were collected from seven stations throughout 2018–2019 along the Laixi River, a vital pollution control unit in the upper reaches of the Yangtze River. Multivariate statistics and positive matrix factorization (PMF) receptor modeling techniques were used to evaluate the characteristics of the river-water quality and reveal the pollution sources. Principal component analysis was employed to screen the crucial parameters and establish an optimized water quality assessment procedure to reduce the analysis cost and improve the assessment efficiency. Cluster analysis further illustrates the spatiotemporal distribution characteristics of river-water quality. Results indicated that high-pollution areas are concentrated in the tributaries, and the high-pollution periods are the spring and winter, which verifies the reliability of the evaluation system. The PMF model identified five and six potential pollution sources in the cold and warm seasons, respectively. Among them, pollution from agricultural activities and domestic wastewater shows the highest contributions (33.2% and 30.3%, respectively) during the cold and warm seasons, respectively. The study can provide theoretical support for pollutant control and water quality improvement in the sub-watershed, avoiding the ecological and health risks caused by the deterioration of water quality.

Abstract Rapid industrial and agricultural development as well as urbanization affect the water environment significantly, especially in sub-watersheds where the contaminants/constituents present in the pollution sources are complex, and the flow is unstable. Water quality assessment and quantitative identification of pollution sources are the primary prerequisites for improving water management and quality. In this work, 168 water samples were collected from seven stations throughout 2018-2019 along the Laixi River, a vital pollution control unit in the upper reaches of the Yangtze River. Multivariate statistics and positive matrix factorization (PMF) receptor modeling techniques were used to evaluate the characteristics of the river-water quality and reveal the pollution sources. Principal component analysis was employed to screen the crucial parameters and establish an optimized water quality assessment procedure to reduce the analysis cost and improve the assessment efficiency. Cluster analysis further illustrates the spatiotemporal distribution characteristics of river-water quality. Results indicated that highpollution areas are concentrated in the tributaries, and the high-pollution periods are the spring and winter, which verifies the reliability of the evaluation system. The PMF model identified five and six potential pollution sources in the cold and warm seasons, respectively. Among them, pollution from agricultural activities and domestic wastewater shows the highest contributions (33.2% and 30.3%, respectively) during the cold and warm seasons, respectively. The study can provide theoretical support for pollutant control and water quality improvement in the sub-watershed, avoiding the ecological and health risks caused by the deterioration of water quality.

Introduction
The river-water quality issue has been recognized as a severe problem. Clean rivers exhibit a vital role in human health and the ecosystem. Moreover, it prevents the environmental, economic, and social risks associated with pollution (Liu et al., 2019a;Sener et al., 2017). However, in the sub-watersheds where regulation is lacking and ecological flows are normally low, water resource management faces more apparent challenges in reducing pollutants Huang et al., 2010;Li et al., 2022;Yang et al., 2022). Furthermore, with increased urbanization and industrialization, industrial and domestic wastewater along with farmland runoff make water quality unstable, and pollution sources of rivers are not identified and quantified (Casillas-Garcia et al., 2021;Zhang et al., 2020a;Zhao et al., 2022). Therefore, rapid water quality assessment and accurate identification of pollution sources are beneficial for 2022; Piroozfar et al., 2021). However, APCS-MLR is relatively conservative and can only extract the principal components with an unspecified number of eigenvalues greater than 1 or a cumulative explained variance higher than 75%. It cannot consider the uncertainty in the data (Liu et al., 2020;Zhang et al., 2020b). The PMF model was initially applied to the source analysis of particulate matter in the atmosphere. It was approved by the US Environmental Protection Agency (US EPA) and improved constantly. Since then, it has been widely adopted for groundwater, soil, and sediment studies (Agyeman et al., 2021;Chen et al., 2016a;Li et al., 2021b;Taghvaee et al., 2018). Compared with APCS-MLR, PMF exhibits superior performance in environmental pollutionsource analysis. The PMF model considers the uncertainty in each sample to assess the actual quality and reliability of the variables. In addition, the number of variables and factors can also be optimized by modeling (Yang et al., 2013). Liu et al. (2019a) compared the model accuracy of PMF and APCS-MLR and fitted the results in a pollution-source analysis study in Taihu Lake, and they reported that PMF outperformed APCS-MLR. Salim et al. (2019) found that the PMF model had more robust performance across study areas with different land use types. Zanotti et al. (2019) combined PMF and geographical information system (GIS) technologies to assess the water quality characteristics of groundwater and surface water and concluded that the PMF model is an effective technique that fits various complex hydrological systems.
Although the PMF model has good performance and great potential for source apportionment, there are still limitations when it is applied to surface water-pollution analysis. Some poorly managed subwatersheds have spatial and temporal distribution of pollution due to differences in land use, rainfall and point source pollution discharge (Gholizadeh et al., 2016;Taoufik et al., 2017;Zhang et al., 2020a). Therefore, it is difficult to explain the source of pollution reasonably by directly processing the entire data set using the PMF model. In this study, we firstly explore the pollution distribution pattern according to CA, and then identify and quantify the contribution of pollution sources in different seasons using the PMF model. It was hypothesized that this approach would yield more accurate and complete water environment information for this river. The results can help researchers determine the priority of pollution control in different periods and provide new strategies for the environmental management of river water.
The Laixi River is the longest tributary of the Tuo River, a first-class tributary of the Yangtze River. It is also a priority unit for pollution prevention and control in the Yangtze River basin (MEEC, 2016). Because of the continuous deterioration in river-water quality, it is essential to evaluate the water quality and quantify the contribution of potential pollution sources. Therefore, the main objectives of this study are (a) to screen essential parameters to establish a highly efficient water quality assessment system so that it can evaluate the water-pollution level in the Laixi River and similar rivers; (b) to reveal the spatial and temporal characteristics of the water quality in the Laixi River, and (c) to identify potential sources of pollution and quantify their contributions.

Study area
The Laixi River is a heavily polluted transboundary river that starts in Chongqing and joins the Tuo River in Sichuan. It is just 12 km away from the Yangtze River, the longest river in China . This study focuses on the Luzhou Sect. (105°14ʹ 57ʺ-105°41ʹ 51ʺ E, 28°59ʹ56ʺ-29°20ʹ 3ʺ N) of the river in Sichuan province (Fig. 4). Luzhou is an important trade and logistic center in the Chengdu-Chongqing economic circle. The area has a temperate, subtropical climate, and the rainy season mainly occurs from May to September. The length of the Laixi River is 195 km, and the Jiuqu and Maxi Rivers are its longest-flowing tributaries. The study area covers an area of 1841 km 2 which is dominated by hilly terrain. The land in the area is mainly divided into cropland and forest land, which account for 58.3% and 21.6%, respectively. The main crop types are rice, corn, and rape, and the primary breeding types are pigs, poultry, and rabbits. Numerous companies are distributed on both sides of the Laixi River. They primarily include food processing and non-metallic mineral companies (LSB, 2019). Because the Laixi River is the main river in most of the northern part of Luzhou, it has essential functions in agricultural irrigation, industrial, and landscaping applications.
In recent years, the water quality of the Laixi River has continued to deteriorate due to rapid industrial and agricultural development and an increase in urbanization (Du et al., 2021). The local government has undertaken several measures for water management, such as strictly controlling the pollutant disposal in the river, increasing the sewage-treatment facilities, and constructing new sewage piping networks. However, water quality of the Laixi River is still unstable and does not meet grade III of the Water Environment Quality Standards in most months (MEPC, 2002). The reason is the watershed contains a high volume of industrial, agricultural, and domestic sewage discharges and complex sources of pollution. COD Cr values are often classified as grade IV (≥ 30) or poorer (Table 1). Figure 1 shows the distribution of the seven sites and the land use in the area.

Data preparation and analysis
Water quality monitoring data were collected monthly from 2018 to 2019 (divided into spring for March-May, summer from June-August, autumn from September-November, and winter for December-February in the Northern Hemisphere). The original water quality monitoring program contained a total of 26 water quality indexes. Unfortunately, some indexes were not always valid during the extended time series. Meanwhile, concentrations of copper, zinc, and hexavalent chromium ions were frequently below the minimum detection limits, and chlorophyll and fecal coliform concentration data were missing at some sites. Ultimately, 12 water quality parameters were selected to comprise the data matrix: total nitrogen (TN), total phosphorus (TP), water temperature (WT), pH, electrical conductivity (EC), biochemical oxygen demand (BOD 5 ), dissolved oxygen (DO), permanganate index (COD Mn ), chemical oxygen demand (COD Cr ), ammonia nitrogen (NH 4 + -N), fluoride ion (F − ), and anionic surfactant (AS). To ensure the accuracy of the measurements, each sample was divided into two samples, and the average measurement of the two samples was considered within the error tolerance.
The descriptive statistics of the water quality data set, including the mean, variance, and water quality grades, are listed in Table 1. Prior to multivariate statistical analysis, the Kolmogorov-Smirnov test was applied to check whether the water quality parameters conformed to a normal distribution. The Kruskal-Wallis test and Pearson correlation coefficient (r) were used to analyze the significance (p < 0.05) of the differences and correlations between indicators (Ding et al., 2016;Leong et al., 2016;Li et al., 2020). For principal component and cluster analyses, all water quality data were standardized using the Z-scale to avoid analysis errors caused by using different units (Liu et al., 2019b).

Multivariate statistical analysis
The multivariate statistical methods used in this study include PCA, CA, and PMF model. Microsoft Excel 2019, SPSS 26, origin 2022, and EPA PMF 5.0 were used for data processing, statistical analysis, and graphic drawing in this study.

PCA and CA
PCA is a multivariate statistical method that is commonly accepted for statistical analysis. It is applied to achieve dimensionality reduction through factor analysis, which organizes complex data sets into several uncorrelated variables (Li et al., 2020). These new variables are linear combinations of the original variables, which can significantly simplify the information of the original data (Piroozfar et al., 2021). The Kaiser-Meyer-Olkin (KMO) and Bartlett's spherical tests were used to determine the suitability of the data set for PCA (with KMO values of 0-1). KMO > 0.5 and a significance level of p < 0.05 in Bartlett's test are considered suitable for PCA (Zhang et al., 2020a). This study also Fig. 1 Geographic location map, land use, and water quality monitoring point distribution uses the maximum variance method for rotation, which maximizes the sum of squares of the loadings for each component. The method optimizes the distribution of factor loadings so that almost every parameter shows a strong loading in one of the principal components (Chen et al., 2015). Loadings > 0.75, = 0.75-0.5, and = 0.5-0.3 were defined as "strong," "medium," and "weak," respectively (Liu et al., 2003).
CA is an unsupervised learning process that can deeply analyze the structural information of a data set. Hierarchical cluster analysis (HCA) is the most common clustering method and is often used in conjunction with other statistical methods in environmental applications (Azhar et al., 2015). Typically, Ward's method and squared Euclidean distance determine the similarity measure, resulting in a smaller sum of squares of departures within each cluster and a larger sum of squares of departures between each cluster (Taoufik et al., 2017). The final dendrogram illustrates the data classification (Zhang et al., 2009).

PMF model
PMF is an efficient multivariate statistical model based on factor analysis (Paatero &Tapper, 1994). The PMF model defines the sample concentration matrix (X n*m ) as the residual matrix (E n*m ) plus the product of the factor contribution matrix (G n*p ) and the factor profile matrix(F p*m ) (Li et al., 2021b), where p is the number of factors, and n and m are the number of samples and the number of water quality parameters, respectively. The basic equation can be expressed as Eq. (1): where x ij is the sample concentration matrix of n samples and j water quality parameters, g ik is the contribution of the kth factor to the ith sample, f kj is the concentration of the jth water quality parameter in the kth factor, and e ij is the residual matrix for each parameter.
In addition to the concentration matrix composed of water quality testing data sets, the user is required to provide uncertainty files at the start interface. The Q-value is generally used as a measure of the model error and is obtained by minimizing the objective function : where u ij is the uncertainty in the concentration data of the observed index j in the ith sample, this calculation is performed under the condition that the source contours and contributions are non-negative, which facilitates the determination of the appropriate factor contours (Agyeman et al., 2021). More details and mathematical principles on the construction of PMF model can be found in previous studies and the software user guide (Norris et al., 2014;Sowlat et al., 2016).
The results of water quality monitoring are often affected by several factors. For example, the temperature, sunlight, and rainfall at the time of sampling may interfere with the possibility of obtaining actual water quality data, and changes generated during the transportation and storage of the samples may also affect the accuracy of the measurements. Therefore, in this study, an extra 10% uncertainty was considered in the model to account for errors outside the uncertainty matrix (Hansen et al., 2007;Zanotti et al., 2019).
To determine the optimal number of factors, 3-7 factors are sequentially entered into the model, and 100 iterations are performed under a random seed mode to obtain the minimum Q-value. It was assumed that (1) the number of known pollution sources in the study area, (2) Q (Robust) is close to Q (True), (3) the R 2 is in the acceptable range and the observed/predicted scatters plot shows a good fit, (4) the proportion of residuals calculated by the model that exceed 3 in absolute terms is as small as possible, and (5) the error analyses (bootstrap (BS), displacement of factor elements (DISP), and BS-DISP runs) were conducted (Demir et al., 2022;Huston et al., 2012;Yang et al., 2013;Zheng et al., 2021). Ultimately, the 5 and 6 factors were found to be the most physically optimal solutions for cold seasons (spring and winter) and warm seasons (summer and autumn), respectively.
The improved Nemerow pollution index The Nemerow Pollution Index (NPI) is a widely used method of water quality assessment, which has advantages of a clear physical concept and simple calculation process (Brady et al., 2015;Chen et al., 2016b). However, the traditional NPI method ignores the differences and relative importance of each parameter, and the weights of the water quality parameters remain undetermined, which could influence the evaluation results (Zhang et al., 2020c). The equation of traditional NPI method is given as follows: where C i is the measured value of each water quality parameter, S i is the standard concentration of each water quality parameter (Table 1), N ave and N max are average and maximum values of the Nemerow pollution index of each parameter, respectively. NPI is the Nemerow pollution index.
Some water quality indicators (metal ions, TP) were not compared with COD Cr , COD Mn, or BOD 5 in detail. A slight increase in these parameters could cause significant changes in water quality. Therefore, an improved NPI method introducing weighting factors based on water quality standards are calculated as follows: where L i is the weight of each Nemerow index, N L is the Nemerow index with weights, and NPI imp is an improved Nemerow pollution index. In addition, there is a contradictory relationship between economic development and environmental protection in several regions. In the less economically developed areas, the government has a limited budget to test numerous physicochemical indicators simultaneously. In this study, the concept of NPI min aims to establish a stable, fast, comprehensive, and low-cost water quality evaluation method to help developing countries improve the efficiency of water Pearson correlation coefficients between the twelve water quality parameters quality monitoring and accelerate the development of management strategies for the water environment. This method uses some critical indicators combined with the same calculation process in Eqs. (3) and (5-7). The parameters are selected based on the principal component analysis, Pearson correlation coefficient, and the actual pollution situation of the current river basin, and the redundant indicators are then eliminated.

Results and discussion
Water quality assessment

Selection of the essential parameters
In order to reduce the monitoring and analysis costs, representative water quality indicators can be selected to show the overall water quality of the basin, which can promote the development of environmental monitoring systems in economically underdeveloped regions (Pak et al., 2021). Pearson correlation (Fig. 2) indicated that CODMn, CODCr, TN, NH 4 + -N, TP, and F − were strongly correlated (r > 0.5), whereas WT and DO were inversely correlated to most indicators. This verifies our assumption about redundant information in the data set. PCA was used to extract the essential parameters to represent the information of the entire data set, and the three principal components (PC1, PC2, and PC3) reflected 72.2% of the total variance (Table 2).
The first principal component (PC1) explained 41.2% of the variance within the variables. BOD 5 , COD Mn , COD Cr, and NH 4 + -N were all loaded above 0.75, which is a strong loading; and TN, F − , and AS exhibited moderate loadings. BOD 5 , COD Cr, and COD Mn are all essential indicators of organic pollution, mainly from urban domestic sewage and industrial wastewater discharges (Salim et al., 2019). COD Cr is often used in the detection of high organic pollution in wastewater, such as industrial wastewater, whereas COD Mn is commonly used in water bodies with organic pollution concentrations of 1-15 mg/l (Tian et al., 2008;Tiyasha et al., 2021). The average COD Mn in the Laixi River is 6-7 mg/L; hence, COD Mn is a more relevant factor in characterizing the degree of organic pollution in the water body. Nitrogen can well reflect the eutrophication degree in water bodies. Nitrogen pollution in rivers mainly originates from domestic sewage, animal manure, and nitrogen-based fertilizers (Fan et al., 2022). In the current surface-water environmental quality evaluation method followed in China, TN is not used as a daily water quality evaluation index. Thus, NH 4 + -N was chosen to measure the degree of nitrogen pollution in the river. Nong et al. (2020) reported that phosphorus from agricultural cultivation processes was one of the primary sources of phosphorus in the river water. Hence, TP was also chosen as a critical parameter to measure the effect of agricultural activities on the Laixi River. F − shows moderate loading in the first principal component. F − is commonly found in wastewater discharged from electroplating, glass processing, metal smelting, and semiconductor manufacturing (Shen et al., 2003). There are several companies on both sides of the Laixi River basin, and the concentration of fluoride ions can relatively reflect the industrial pollution affecting it. Thus, F − was also selected as an essential parameter.
PC2 explained 17.3% of the variance variables. pH and DO show positive loadings. The mean pH values for all stations and seasons in the Laixi River basin ranged from 7 to 8, and the coefficients of variation were all below 7%, showing good stability. In  (Tiyasha et al., 2021). DO has been chosen in several water quality evaluations (Kannel et al., 2007;Simoes et al., 2008;Wu et al., 2021). However, the concentration of DO in a river is related to many factors, such as water temperature, water depth, flow rate, atmospheric pressure, and algae (Chen &Liu, 2014;Hu et al., 2021;Kang et al., 2018). Errors in measurement of DO concentration may be introduced by different sampling time and by different sampling personnel. Thus, DO may misrepresent the water quality scores, which was also reported in the study of Sun et al. (2016). Therefore, DO was also not selected. PC3 explains a smaller number of variables (13.7%), in which EC and WT show strong loadings compared to PC1. In addition, EC and WT are mainly affected by meteorological and geographical factors (Sener et al., 2017). Finally, COD Mn , TP, NH 4 + -N, and F − were the critical parameters selected to portray the effect of multiple factors on river pollution.

Establishment of NPI min
The selected COD Mn , TP, NH 4 + -N, and F − can all be found in the water quality standards with corresponding limits, allowing us to calculate each parameter's weight with the NPI method. Compared to the Water Quality Index (WQI) method and the single-factor evaluation method, the improved NPI method can provide more objective and comprehensive evaluation results (Chen et al., 2016b;Yang et al., 2021). The WQI method uses expert judgment or literature review to determine the parameters and weights, which renders the evaluation model overly complex and subjective, and thus increasing uncertainty during the model implementation (Iwar et al., 2021;Uddin et al., 2022). Furthermore, because river self-purification capacity and environmental conditions vary from place to place, the weights of water quality parameters in one place may not be applicable to other places (Sudhakaran et al., 2020). The single-factor evaluation method can quickly determine the gap between water quality and environmental standards. However, it is too strict and only considers the impact of the most polluting parameter on the water quality, and it cannot accurately reflect the water quality. (Haque et al., 2020).
Based on these considerations, an improved NPI method was chosen for this study. Adding the concept of weights to the model and selecting key parameters, the value of NPI min was obtained. The final index evaluate the water-pollution level and enable the comparison of water quality at specific time and region. However, short-term data sets significantly fluctuate with the natural processes and anthropogenic activities. In contrast, long-term ones underestimate the impact and risk of pollutants, so the seasonal scale is chosen to evaluate the water-pollution situation (Zhong et al., 2018). The value of NPI min corresponds to the water quality standard level as follows: NPI min < 0.73, clean (I); 0.73 < NPI min < 0.83, relatively clean (II); 0.83 < NPI min < 1, slight pollution (III); 1 < NPI min < 1.59, moderate pollution (IV); NPI min > 1.59, serious pollution (V).
The average NPI min values indicate that the water pollution in the areas and seasons is as follows: NDQ (1.  Fig. 3 The heatmap graphs of the NPI min values, covering seven sites and four seasons the tributaries in the middle of the study area, and the surrounding building land is increasing, which increases the pollution in the river. The highest pollution levels in all sites were observed in spring, which is most likely related to the frequent agricultural planting activities during spring.

Spatiotemporal analysis
The water quality assessment indicated that the water quality of the Laixi River exhibited a spatial deterioration followed by recovery as well as significant changes with the seasons. The Kruskal-Wallis test results showed that there were significant differences between locations in all indicators except WT and pH and between seasons in all indicators except TP. Therefore, CA was used to reveal further the spatial and temporal distribution patterns of water pollution and to verify the accuracy of the NPI min procedure. The two-year data (2018-2019) are divided into three spatial groups and two temporal groups (Fig. 4). Based on the clustering results of the spatial distribution, the SSY, NDQ, and DWT sites are classified as Group 1 with an average NPI min of 1.26, which refers to a highly polluted area. SSY and NDQ are tributary monitoring sites of the Jiuqu River, and DWT is a tributary monitoring site of the Maxi River. The watershed area of the Maxi River is vast, with developed agricultural production, and there are several industrial sites on both sides of the Jiuqu River. Moreover, these three sites are surrounded by buildings and cultivated lands, indicating that the tributaries may receive a large amount of industrial wastewater, domestic sewage, and agricultural non-point source pollution. Among all sites, the average concentration of TP and COD Cr exceeded the standard highest multiples at SSY and NDQ, which can be attributed to the lack of wastewater treatment facilities in the area.
The EXJ and GDDQ sites in the middle and lower reaches are classified as Group 2 with an average NPI min of 0.88, which is a low-pollution area. The pollution index of the EXJ site is higher than that of the upstream TZSDQ site, reflecting the negative impact of the convergence of the tributaries on the water quality. The sub-basin of the GDDQ site is surrounded by cultivated land and forest land with fewer types of pollution sources and a lower pollution index. The inlet monitoring site (TZSDQ) and the outlet-monitoring site (HSDQ) are classified as Group 3 with an average NPI min of 0.79. the water quality standard grade III, and the average DO concentration exceeded 8 mg/l, indicating that the long-distance dilution and self-purification enabled the water ecological system to recover.

Most of the monitoring indices at HSDQ reached
In the temporal analysis, summer and autumn were categorized as Group 1 with a mean NPI min of 0.80, indicating a light pollution. Spring and winter were categorized as Group 2 with a mean NPI min of 1.10, indicating a heavy pollution. Table 1 shows that only the average concentrations of TP and NH 4 + -N in spring exceeded the water quality standard grade III, and the average concentrations of TN and COD cr reached the maximum in spring. The pollution index was the second highest in winter, which may be attributed to the decrease in the flow during the dry period, causing the pollutant concentrations to increase. However, DO levels in the river were at their highest in winter because oxygen dissolves more easily in water at low temperatures (Zhao et al., 2013). The lowest pollution indices were observed in the summer and autumn, which coincide with the rainy season (May-September), when the risk from sewage leakage as well as agricultural and urban runoffs increases.
The CA results are consistent with the water quality assessment results, indicating that NPI min can efficiently evaluate water quality and help the city departments reduce analysis costs and better respond to water quality changes. The spatial results of the water quality in the Laixi River have been more reasonably explained compared to the temporal results according to the anthropogenic activities and land use changes. However, significant differences in the water quality over time require further analysis. Therefore, the PMF model was used to analyze the pollution-source apportionment during the cold and warm seasons.

Cold-season pollution-source analysis
After implementing the selected factors, the PMF resolution results show that most water quality indicators exhibited an excellent model fit. In the cold seasons, the R 2 values for all indicators, except pH, were in the range of 0.71-0.96, and in the warm seasons, the R 2 values were in the range of 0.50-0.99. Figure 5 shows the observed/predicted fitting diagram of some indicators. The predicted and observed values of most of the data are strongly correlated, which confirms the ability of the PMF model to perform source identification and assignment consistently. Figure 6 shows the fingerprint of the factors resolved by the PMF model. The figure shows the contribution of the pollution factors to the water quality parameters. In addition, the contribution rate of each factor can be calculated based on the base file provided by the PMF model. The factors for cold and warm seasons were ranked based on their contribution rates as CF1-CF5 and WF1-WF6, respectively.
The first factor (CF1) accounts for 33.2% of the total source contribution for cold seasons. The main loading parameters for this factor are TP (55.9%), TN (41.7%), and COD Mn (27.0%). The high concentrations of nitrogen, phosphorus, and permanganate in surface water originate from various sources, including urban and rural domestic sewage, fertilizer use and livestock manure effluent, and industrial wastewater (Liu et al., 2019a). Luxian County in the north of the study area, is a traditional agricultural county which carries the main agricultural planting and livestock breeding activities in the region. In agricultural activities, the fertilization structure is inefficient, and the excess nutrients enter the river along with the surface runoff and farmland drainage (Chen et al., 2013). Moreover, livestock breeding lacks proper management of process wastes (animal carcasses, feathers, feed, and manure), which also increases the risk of non-point source pollution. In previous studies of surface-water-pollution sources, aquaculture was often ignored and not particularly discussed (Cho et al., 2022;Zhang et al., 2009Zhang et al., , 2020a. However, through field surveys during the study period, it was observed that fishery workers drain and dredge fishponds in the winter before the Chinese Spring Festival. Thus, the pollution in the aquatic environment by aquaculture is indeed very small, but it is characterized by a large concentration in short periods. In winter, the river's ecological flow is low, and the high concentration of aquaculture wastewater increases the pollution load in the aquatic environment (Saxena et al., 2022). Therefore, CF1 represents "pollution from agricultural activities," mainly attributed to agricultural runoff, livestock wastewater, and aquaculture during cold periods.
CF2 explains for 22.7% of the total source contribution and consists predominantly of WT (77.6%), EC (58.2%), DO (57.7%), and pH (40.7%). Generally, WT is mainly affected by season, whereas EC is affected by the iron concentration and species in rivers (Chen et al., 2015). DO is an essential indicator of the health status of various water bodies and is closely related to their self-purification ability. For aquatic organisms and fish, very low DO can affect their metabolism and growth. DO is closely related to WT because it can dissolve more easily in cold water bodies (Sun et al., 2016;Tomic et al., 2018). The water pH can affect the solubility of metals in water and the release of nitrogen and phosphorous in sediments. Thus, it is a vital parameter in the water quality monitoring of rivers, drinking water, and agricultural water (Jin et al., 2006;Osibanjo et al., 2011). Because of the strong temporal distribution of these parameters, CF2 represents "seasonal and meteorological effects." CF3 is the third factor, and the main relevant parameters are NH 4 + -N and AS, with contributions of 68.3% and 58.4%, respectively. Human activities, including nitrogen fertilizer use, urban and rural domestic sewage discharge, livestock breeding, and industrial activities, mainly generate ammonia nitrogen. AS has excellent detergency, emulsification, and foam, widely used in various detergents and cleaning agents. Consequently, it is one of the common foreign compounds in municipal wastewater (Perales et al., 1999). According to the above information, CF3 pollution mainly originates from domestic sewage discharges in urban and rural areas. In recent years, Luzhou municipal departments have focused on collecting and treating wastewater and have achieved full coverage of the sewage-treatment plants in the field towns (LEEB 2019). However, most sewage-treatment plants are old and use outdated technologies. In addition, some remote rural sewage-treatment plants do not have professional technical personnel and cannot effectively manage sewage-treatment equipment. These issues prevent the efficient treatment of domestic sewage. Thus, CF3 represents "domestic sewage." CF4 is characterized by BOD 5 (45.3%), COD Cr (40.8%), and F − (32.3%). According to the second national pollution-source survey data from the census bulletin of Luzhou (LEEB, 2020), wine enterprises account for two-thirds of the water-related factories in the Laixi River basin. The wastewater from alcoholic beverage companies mainly originates from brewing-process wastewater, which has exceptionally high COD Cr (> 10,000 mg/L) and BOD 5 values (Xin et al., 2017). The sources of F − in rivers have been previously discussed (mainly industrial wastewater discharge) (Huang et al., 2010;Shen et al., 2003). In addition, industrial production can return fluoridecontaining waste gas to soil and surface water through atmospheric deposition. SSY and NDQ have the highest proportion of building land and are located near large industrial parks. Therefore, the average concentration of F − at the SSY and NDQ was higher than at other sites, exceeding the water quality standard V and causing serious pollution. Therefore, CF4 represents "industrial wastewater." CF5 accounted for 8.7% of the total source contribution. F − was the only parameter in which the contribution exceeded 20% in this category; hence, it represents "unidentified sources."

Warm-season pollution-source analysis
In warm seasons, the factor with the most significant proportion is named WF1, which accounts for 30.3% of the total source contribution. AS, NH 4 + -N, and TN contributed 63.5%, 53.5%, and 40.0%, respectively, to this factor. These parameters are similar to CF3; hence, they can be attributed to domestic sewage pollution. In summer, the volume of domestic sewage exceeds the daily treatment capacity limit, and large quantities of the sewage are discharged into the rivers without meeting the standard treatment and sometimes with no treatment. Furthermore, during heavy rains, the city's municipal network overflows with incomplete separation of rain and sewage. Thus, large quantities of the domestic wastewater discharged to the sewage-treatment station overflow directly into the river. Thus, domestic wastewater is the primary source of pollution in the study area during the warm seasons.
The second factor (WF2) represents 21.3% of the total source contribution. BOD 5 , F − , and COD Cr responded to this factor for 51.8%, 51.2%, and 37.6%, respectively. This is similar to CF4, which is likely caused by industrial factors. The sewage-treatment plant built by local enterprises has limited capacity for sewage treatment. Temperature can affect the growth and behavior of microorganisms (Li et al., 2021a). High summer temperatures may lead to insufficient oxygen supply and sludge uplifting, which affects the effectiveness of wastewater treatment. Environmental workers in Luzhou have reported in their research that several plants experience massive death from activated sludge, whereas some wastewater treatment facilities are at risk of leakage (LEEB, 2019). Therefore, WF2 represents "industrial wastewater." WF3 is dominated by TP (40%) and TN (25%), which is similar to CF1. The agricultural planting activities are significantly reduced in the summer and autumn, but there are still several crops requiring fertilizers, which is also an important source of agricultural non-point source pollution during the rainy season. In addition, livestock farming and aquaculture can still affect river-water quality during this period. Therefore, the WF3 represents "agricultural activity pollution." WF4 accounted for 13.2% of the total pollution, with the main contributions from WT (64.2%), EC (34.9%), pH (33.6%), and DO (30.9%). These are similar to the indicators in CF2, which mainly represent the effect of physicochemical and natural processes. WT is closely related to air temperature. Thus, WF4 represents "meteorological and seasonal effects." WF5 is characterized by a high percentage of DO (27.9%). DO is closely related to the production of organic matter and can relatively characterize the level of organic pollution in the river (Gholizadeh et al., 2016). Summer and autumn (June-November) include the rainy season (May-September), in which most surface runoff is generated. This is mainly caused by rainfall during the rainy season. Because the agricultural runoff has been classified as WF3 (i.e., pollution from agricultural activities), here we primarily discuss pollution from urban surface runoff owing to impervious surfaces, such as urban roads, construction sites, and building roofs. These areas accumulate large quantities of pollutants with complex properties, including various nutrients, heavy metals, and organic substances (Chen et al., 2016b;Helmreich et al., 2010). During rainy periods, these pollutants are washed into ditches and river waters and become the main source of urban runoff pollution where large quantities of organic substances, such as household garbage, leaves, and grass, consume a large amount of oxygen in river waters as they decay, leading to the eutrophication of water bodies (Yao &Sun, 2020). Therefore, WF5 represents "urban surface runoff pollution." On the other hand, WF6 only accounted Fig. 6 Contributions of contamination sources during the a cold seasons and b warm seasons using the PMF model for 4.3% of the source apportionment, and no indicator exhibited a significant contribution to this factor. Thus, it represents "unidentified sources." The low percentage of unknown contributions reflects the good source allocation ability of the PMF model. Figure 7 shows a pie chart of the average contributions of different pollution sources to water quality. The figure shows a significant difference between the primary pollution sources resolved by the PMF model in the two time periods. In the cold seasons, the contribution of pollution sources is ranked as follows: agricultural activity pollution > meteorological and seasonal effects > domestic wastewater > industrial wastewater > unknown contribution. In the warm seasons, the contribution of pollution sources is ranked as follows: domestic wastewater > industrial wastewater > agricultural activity pollution > meteorological and seasonal effects > urban surface runoff > unknown contribution. The cold seasons exhibited the highest contribution of pollution from agricultural activities (33.2%). TP and TN accounted for the largest proportions of this factor. This is similar to the research results obtained by Zhong et al. (2018) for the Balihe Lake area, in which nitrogen and phosphorus in spring and winter are the main parameters affecting regional water quality and are generally delivered into the water through surface and irrigation runoffs. However, in the warm seasons, the cumulative contribution of domestic sewage and industrial wastewater exceeds 50%, necessitating the construction of new wastewater treatment facilities and municipal pipe networks. A study by (Gholizadeh et al., 2016) in South Florida observed that urbanization and the high density of environmental resource permits increased the domestic and industrial wastewater during the rainy season. The cold seasons contribute twice as much as the warm seasons to the unidentified sources, and five parameters contribute more than 10% to this factor. This indicates that there may be undetected pollution sources during the cold seasons, which requires additional field investigations. It is also possible that the PMF model developed for the cold seasons still has shortcomings and needs further improvements.

Conclusions
In this study, multivariate statistical techniques and PMF receptor modeling were used to evaluate water quality and assign potential pollution sources to the Laixi River basin in China. An improved NPI min method was developed using essential parameters (COD Mn , NH 4 + -N, TP, and F − ) and combined with the PCA to analyze the water quality in the area. The results of the water quality assessment indicated that the two main tributaries of the Laixi River were heavily polluted, and the water quality was worse in the cold seasons (spring and winter) than in the warm seasons (summer and autumn). The CA results are in good agreement with the NPI min ranking, verifying that NPI min is efficient and stable and that it can be used in combination with CA to investigate the spatiotemporal variations in river pollution. The PMF model identified five and six potential pollution sources in the cold and warm seasons, respectively. In the cold seasons, agricultural activities are the most significant contributor (33.2%) to river pollution, followed by the meteorological and seasonal effects (22.7%), domestic sewage (20.1%), industrial wastewater (15.3%), and unidentified sources (8.7%). In the warm seasons, the most significant contributor (30.3%) is domestic sewage, followed by industrial wastewater (21.3%), agricultural activities (18.5%), meteorological and seasonal effects (13.2%), urban surface runoffs (12.5%), and unidentified sources (4.2%). Therefore, environmental management of water in the region should consider the seasonal characteristics and the contribution of pollution sources during different periods. In the cold seasons, optimizing the fertilizer application strategies and the management of aquacultures and livestock farming is indispensable. The old sewage-treatment plants in towns and factories should be upgraded. New urban pipe networks are required to handle the increase in wastewater quantities in the warm seasons and improve sewage collection and treatment capacity.
The results indicated that the PMF receptor model is a capable and effective tool for river pollutionsource identification and assignment. The results of this study provide a reference for policymakers and researchers to clarify priorities for pollution management and achieve precise control and management of pollutants. In addition, the study provides a new method for environmental management in similar sub-watersheds. In future work, more water quality parameters (metals, chlorophyll, and petroleum wastes) can also be entered into the concentration matrix in the PMF model to improve the identification accuracy and reduce the proportion of unidentified sources. Moreover, the stability of the PMF model can be verified by comparing it with other pollution-source analysis methods, such as the export coefficient method, APCS-MLR model, and stable isotopes.