Mapping the spatiotemporal diversity of precipitation in Iran using multiple statistical methods

Despite being located in a semi-arid and arid part of the world, Iran enjoys a very diverse climate. Our objective is to regionalize the country into homogeneous precipitation regions and determine the monthly and annual precipitation water volume and depth in each region, required in hydro-climatological studies and applications, from simple water budget calculation to infrastructure design. We investigate the spatiotemporal diversity of precipitation over the country by analyzing the 33-year-long monthly precipitation time series (1983–2016) at 461 rain-gauge stations. We employed cluster analysis (CA) both hierarchical and non-hierarchical clustering approaches and principal component analysis (PCA) to determine the homogeneous precipitation zones at three macro-, meso-, and micro-scales (resolutions). First, the country is divided into six macro-precipitation regions (MPRs) using CA each showing a mean annual hyetograph of unique pattern and depth. The Siberian cold continental air mass enters the country from the north, the Sudan air mass from the south and southwest, the Mediterranean air mass from the west, the North Atlantic and the Black Sea cyclones from the northwest, and the Maritime air mass from the southeast create these six precipitation regions. Then, the six regions were divided into ten zones of meso-resolution through hierarchical clustering (HC) and k-means clustering. The occasional collision of the air masses causes the division of the six macro-regions into ten zones at meso-resolution. Finally, we subjected the precipitation time series of ten meso-zones to PCA, HC, and k-means clustering and established an optimal number of 24 micro-zones for the first time that reflects a comprehensive precipitation map over the country. The annual hyetograph of each zone shows a unique pattern and distribution with a varying magnitude of monthly precipitation compared to others as the result of varied physio-geographical characteristics of the country prevailing in each micro-zone. The result shows that hierarchical clustering (Ward’s method-Pearson correlation) and PCA have the same classification performance and strength in meso- and micro-climatological zoning. The long-term (i.e., 33 years) mean annual hyetograph in each region and zone is also calculated, and the monthly and annual-precipitation water volume and depth in the country are estimated. The findings provide the researchers, practitioners, and decision-makers with an accurate baseline reference for future research and water resource management and will advance the understanding of precipitation dynamics in different regions of the country.


Introduction
Precipitation is the main component of the water cycle and the sole source of fresh water on the Earth, ignoring the polar ice. The general circulation of the atmosphere, latitude, longitude, topography, and location of the land relative to large water bodies (i.e., seas and oceans) are the main factors affecting the intensity, duration, and variation of precipitation in a region. The precipitation regime of Iran is influenced by several synoptic systems entering the country from different directions (Khalili et al. 2016;Sabziparvar et al. 2015;Heydarizad et al. 2018): Siberian cold continental air mass (cP air mass) from the north, the Sudan air mass (cT air mass) from the south and southwest, the Mediterranean air mass (MedT) from the west, North Atlantic and the Black Sea cyclones (mP) from the northwest, and the Maritime Tropical air mass (mT) from the southeast (the Indian Ocean and the Oman Sea). The collision of the Mediterranean and the Sudan air masses and the occasional merging of the Sudan air mass with the Maritime air mass make the precipitation regime of Iran more complex. Alborz and Zagros mountain ranges in the north and west are major Iran's topographic features of varied elevations preventing air masses from bringing moisture toward the central Iran plateau with the two large deserts of Dasht-e Lut and Dashte Kavir (Heydarizad et al. 2018). Air masses entering Iran together with varied and complex topography, large latitude (25° N to 40° N) and longitude (44° E to 64° E) variations, extensive aerial extent (1,648,195 km 2 ), and vast nearby water bodies (the Caspian Sea, the Persian Gulf, the Oman Sea, the Red Sea, the Black Sea, and the Mediterranean Sea) create a diverse climate in Iran.
Spatiotemporal diversity of precipitation in terms of duration and intensity creates many challenges in the world (Hong et al. 2007;Kidd and Huffman 2011). Determining the homogeneous precipitation zones and precipitation water volume at different scales is essential for water resource management from simple water budget calculation to infrastructure design. Precipitation and climatological zoning have long been of interest to researchers around the world. Köppen was the first researcher who classified the world climate in 1900 based on vegetation, temperature, and humidity (Kottek et al. 2006). Many regional investigations have been performed on climatological zoning in the world (Mills, 1995, in Spain;Baeriswyl and Rebetez, 1997, in Switzerland;Guhathakurta and Rajeevan 2008, in India;Gocic and Trajkovic, 2014, in Serbia;Uddin et al. 2019, in Bangladesh;Srivastava et al. 2019, in Florida and California;Knerr et al. 2020, in Corsica;Ribes et al. 2020;Dinpashoh et al. 2004, in Iran;Modarres and Sarhadi, 2011, in Iran;Raziei, 2018 andMasoodian, 2003, in Iran;Alizadeh et al. 2019, in Iran;Sengupta et al. 2021, in India).
In Iran, Domroes et al. (1998) prepared 31-year monthly rainfall records  at 71 rainfall stations and identified five precipitation zones using two methods of principal component analysis (PCA) and cluster analysis (CA). Modarres (2006) used the hierarchical cluster analysis of annual and monthly precipitation of 28 main towns of Iran and recognized eight homogeneous precipitation regions. Soltani et al. (2007) fitted the monthly precipitation time series of 28 cities into ARIMA (Auto-Regressive Integrated Moving Average) models and by PCA and CA determined three simple, moderate, and complex climate regions in Iran. Karimi and Samani (2009) applied PCA to the 29 years of monthly rainfall from 44 synoptic stations in Iran and showed that the first ten components formed about 63% of the total variance of the rainfall data and classified Iran into nine climatological zones. Shirvani and Nazemosadat (2012) used PCA and CA and analyzed 33 years of monthly precipitation at 42 stations and regionalized Iran's precipitation to 6 homogeneous regions. Maryanaji (2012) studied the variability of the precipitation regime and classified Iran's climate into arid, semi-arid, Mediterranean, humid, and sub-humid groups. Sarmadi and Shokoohi (2015) obtained eight precipitation regions over Iran based on the standardized precipitation data from 1951 to 2007, using two multivariate methods of factor analysis (FA) and CA. Darand and Daneshvar (2014) applied PCA and CA to eight seasonal rainfall-based variables (1951 to 2007), identified ten rainfall regimes in Iran, and reported the winter as the main rainfall period. Dinpashoh et al. (2004) investigated the precipitation climate of Iran using FA and CA methods on 12 related precipitation variables of 77 weather stations for the period of 1956 to 1998. They found seven climatological regions for Iran, of which six were homogeneous with respect to the H-statistic. Raziei (2017) used Köppen and Geiger method for climatic zoning of Iran. Raziei (2018) studied the precipitation regime of Iran using monthly precipitation of 155 synoptic stations from 1990 to 2014. Their study resulted in five and seven sub-regions by two methods of S-mode and T-mode PCA, respectively. The climatological zoning will be different and depends on the number and distribution of stations, the period of precipitation data, and the analysis procedure. For this, the numbers and extents of the precipitation sub-regions identified in Iran by the above precipitation regionalization studies do not match each other.
In this paper, we attempt to homogenize the spatiotemporal diversity of precipitation over the country by analyzing 33 years of monthly rainfall data recorded at 461 stations in three macro-, meso-, and micro-scales, using PCA, hierarchical, and non-hierarchical clustering. At the three resolutions, the country was divided into six, ten, and twenty-four precipitation zones, each differentiated with a unique annual hyetograph. The calculated monthly and annual precipitation water volume and depth in the homogenized precipitation zones provide useful information on the water availability in each zone for water resources management in the country. In addition, the relative potential of statistical approaches in differentiating the precipitation zones is highlighted. The paper is organized as follows: Sect. 2 describes the study area and data used. The methodology is explained in Sect. 3. In Sect. 4, we introduced the results and discussion. Section 5 presents the conclusion.

Study area
Iran is a country located in West Asia with an approximate area of 1,648,195 km 2 within the latitude of 25-40° N and longitude of 44-64° E, bordered by the Caspian Sea to the north and the Persian Gulf and the Oman Sea to the south. Iran's neighbors are Afghanistan and Pakistan in the east; Turkmenistan, Azerbaijan, and Armenia in the north; Turkey and Iraq in the west; and the Arab States of the Persian Gulf in the south (Fig. 1). Iran has a varied topography. Mount Damavand (5671 m above the m.s.l.) and the southern coast of the Caspian Sea (28 m below the m.s.l.) are the highest and lowest points in the country, respectively (Madani, 2014). Dasht-e-Lut and Dasht-e-Kavir are two major deserts covering the central part of the country. Two high mountain ranges of Iran are Alborz and Zagros in the north and west, respectively, which cause the low rainfall in the country's interior (Alijani et al. 2008;Balling et al. 2016;Vaghefi et al. 2019). Iran has a varied climate, often arid or semi-arid, characterized by high evapotranspirationpotential and low rainfall. Annual precipitation is lower in the eastern half of Iran compared to the western half (Nazemosadat et al. 2006). The annual precipitation varies from over 1000 mm on the west coast of the Caspian Sea and the western highlands to less than 50 mm in uninhabitable eastern deserts. The average annual rainfall across the country is estimated equal to 250 mm, which is less than a third of the global average annual rainfall. Winter is the season with the heaviest rainfall, and only a few regions of the country (Caspian Sea coast, northwest, and southeast) receive rain in summer. The temperature varies from − 20 to + 50 °C throughout the country. The hottest and coldest months of the year are July and January with average temperatures of 19 to 39 °C and 6 to 21 °C in most parts of the country, respectively. Significant spatial and temporal variability of rainfall in Iran has been the motivation for the construction of large dams and reservoirs to regulate water flow (Madani, 2014).

Data
The daily precipitation records of 404 rain gauges and 57 synoptic stations were collected from the archive of the Regional Water Organization and Meteorological Organization of Iran (461 stations in total) for 1983 through 2016. The daily data in each month of the year were aggregated to form the monthly precipitation time series. The location of these stations is shown in Fig. 1. It should be noted that there are over 1450 rain gauges in Iran. However, we selected the 461 stations with the longest (i.e., 33 years) of continuous records.

Methodology
We employed CA and PCA to classify the spatiotemporal regimes of precipitation in Iran. CA is a statistical method for classifying sample data into clusters such that each cluster is of high similarity and a sharp and distinct difference between clusters exists. CA includes hierarchical and non-hierarchical clustering approaches (Fovell and Fovell, 1993). Hierarchical clustering analysis (HCA) constructs a hierarchy of clusters, a cluster tree, or dendrogram starting with each point (record) as a single cluster and then repeatedly merging the most similar pair of clusters until reaching a single all-encompassing cluster (Everitt et al. 2001;Rencher and Schimek 1997). HCA was applied, using Euclidean distance and Pearson correlation as similarity measures and Ward's method as the linkage rule to classify the precipitation data. K-means clustering (Mac-Queen, 1967) is one of the non-hierarchical clustering (partitioning clustering) procedures and the most often uses an unsupervised machine learning algorithm for dividing a data set into a set of K clusters (Kassambara, 2017). CA is performed on the data using IBM SPSS Statistics and R software.
PCA is one of the simplest and the best methods of data analysis that is widely used in climatology (Ehrendorfer, 1987;White et al. 1991). PCA reduces a large matrix of data to several main components or principal components (PCs). The first PC explains the main part of the total variance of a data set, and the next components explain a smaller portion of the remaining variance successively. PCs are the eigenvectors of a variance-covariance matrix, and PCA is the technique that attempts to reveal the underlying latent structure which exists within a series of data set. Only those PCs that have eigenvalues greater than one or explain a specified % of the total variance, usually above 65%, are significant and will be selected for interpretation (Candeias et al. 2011). PCA was performed on the monthly precipitation data using IBM SPSS Statistics, and the number of components was determined using eigenvalue, % of the total variance, and score plots. Then, the Varimax orthogonal rotation method was used to simplify the interpretation of non-rotating principal components (Richman, 1986;Domroes et al. 1998).

Results and discussion
To assess the spatiotemporal variations of precipitation and map the country into homogenous-precipitation provinces, we analyzed the precipitation time series introduced in Sect. 2.2 at three macro-, meso-, and micro-resolution levels. Macro-resolution defines the largest homogeneous regions, each having unique precipitation characteristics. In other words, each precipitation province has a longterm mean hyetograph with unique annual distribution and unique monthly precipitation magnitude (depth) different from others.
Meso-resolution level subdivides each macro-region into an optimum number of zones with annual hyetographs of the same distribution as macro-resolution but varied monthly precipitation depths.
Micro-resolution subdivides each meso-zone into an optimum number of homogeneous subzones with the least extensive aerial extent and different monthly precipitation magnitude. Note that here, we use the terms macro-, meso-, and micro-scales in the context of Iran's geographical and political boundaries.
In the following, we present the analysis results at each scale, outline the statistical significance of the results, and discuss the possible causes responsible for forming the precipitation zones over the country. We then compare the resulting homogeneous zones with previous works and highlight the novelty of our research.
We also calculate the total precipitation water volume and depth in each zone and the country as a whole, providing an accurate baseline reference for the practitioners and decision-makers in the water management sectors.

Precipitation diversity in macro-scale
The precipitation time series were subjected to CA, and the results are presented in Fig. 2. Based on Ward's method and Pearson correlation, the precipitation regimes in Iran are spatially classified into six distinct regions, which we call the macro-precipitation regions (MPR) of Iran. The mean annual hyetographs of the six regions illustrated in Fig. 2 show that precipitation in each region has a unique temporal distribution, sharply different from that of the others. The long-term mean annual precipitation depth in the six regions ( P ) is plotted in Fig. 3. The MPR1 along the Caspian Sea with a mean annual precipitation of 885 mm and the southeastern desert provinces (MPR6) with 158 mm are the wettest and driest regions, respectively. Considering the moisture air masses entering the country, it seems that the spatial diversity of precipitation is greatly controlled by these air masses. Siberian cold continental air mass The long-term mean annual precipitation in the six macro-precipitation zones of Iran (known as (cP) air mass) enters from the north and is blocked by the Alborz Mountain range carries moisture from the Caspian Sea and provides heavy precipitation in MPR1. The Sudan air mass enters from the south and southwest of the country carrying moisture from the Arabian Sea, the Red Sea, the Persian Gulf, and the Oman Sea (Alijani, 2000;Khalili et al. 2016;Heydarizad et al. 2018) and influences the precipitation regime in MPR2. The Mediterranean air mass (MedT) affects the precipitation regime in MPR3. North Atlantic and Black Sea cyclones (Sabziparvar et al. 2015) being rich in humidity and entering from the northwest direction provide precipitation in MPR4.
The precipitation regime of MPR6 is controlled by the Maritime Tropical air mass (mT) entering the country from the southeast direction (Indian Ocean and Oman Sea). While the Mediterranean air moisture influences the precipitation regime of MPR5 but due to the desert nature of this region, the precipitation amount is considerably less than that of MPR3. Table 1 presents the similarity between our classification and the previous ones. For example, MPR4 corresponds to zone 1, G3, and D of Domroes et al. (1998), Modarres (2006, and Karimi and Samani 2009, respectively. Considering the length of data, the number of rainfall measuring stations, and methodologies used by the above researchers that fall statistically short compared to our data and procedures (Table 1, the last two rows), our classification is more rigorous, more comprehensive, and climatologically sound.

Precipitation diversity in meso-scale
In the previous section, we demonstrated that Iran's precipitation regime is dividable into six regions, mainly as the result of the moist air masses entering the country. However, due to vast area, complex topography, large latitude variations, and extensive nearby water bodies, its variabilities are more complex, so we examined the six regions to see if we can cluster them into more sub-regions. For this, the Hopkins statistic (Lawson and Jurs, 1990) was used. The Hopkins value for the 461 rain gauge stations is determined in the range of 0.65-0.80, indicating that the data are further groupable. The Gap Statistic method was also used to determine the optimal number of clusters for both hierarchical and k-means clustering. As presented in Fig. S1 (S stands  (Charrad et al. 2012) for determining the optimal number of clusters for hierarchical and k-means clustering. All tests demonstrated that the optimal number of groups is ten. Figure S2 depicts the cluster dendrogram of 461 stations. Meso-precipitation zones of Iran based on Ward's method-Euclidean distance and k-means clustering are shown in Fig. 4. The ten clusters (A to J) resulted from Ward's method-Pearson correlation mapped on Fig. 5 present Iran's precipitation diversity in a meso-scale. We chose the meso-precipitation zones according to Ward's method-Pearson correlation since it gives the best distinct classification (Fig. 5 compared to Fig. 4). Also, the long-term (1983-2016) mean annual hyetograph in zones A to J are plotted in Fig. 5. The mean annual precipitation in each zone is shown in Fig. 6. Zones A and J have the maximum (1293 mm) and minimum (123 mm) mean annual precipitation, respectively. Comparing Fig. 4 with Fig. 5, it is observed that (a) zone MPR1 (the southern shore of the Caspian Sea) is divided into zones A and B because the eastern part of this zone is mainly affected by the cP air-mass and the western part receives moisture from cP and mP fronts. (b) Zone MPR2 (Southwestern region) is divided into zones C and E. It is most probably due to the collision of the Mediterranean and the Sudan air masses. (c) Zone MPR6 (Southeastern region) is divided into zones H and J as the result of the occasional merging of the Sudan air mass with the Maritime air mass entering the country from the southeast direction (Indian Ocean and Oman Sea). The similarity between our meso-scale precipitation zoning and the previous ones by other researchers is tabulated in Table 1.

Precipitation diversity in micro-scale
In the next step, we examined the possibility of division of each of the ten meso-scale zones (i.e., A to J) into smaller, rigorous homogeneous sub-zones. For this, several statistical indices from the NbClust package (Charrad et al. 2012) are tested to define the optimum number of sub-zones (micro-zones) in each of the ten meso-scale zones. For K-Means Clustering and HCA the Kaiser-Meyer-Olkin (KMO) criteria and Bartlett's test were used, and the total variance, eigenvalue, and score plot for PCA. Table S1 presents the statistical tests that indicated the same optimal number of microzones in each zone. The result of all tests was significant at a p-value = 0.05 (95% confidence limit). As can be observed in the last column of Table S1, except for zone A, zones B, F, H, and J are dividable into two sub-zones; zones C, D, and E to three sub-zones; and zone G to four subzones. In other words, the spatiotemporal regime of Iran's precipitation is classified into 24 micro-zones. Therefore, k-means clustering and HCA resulted in the 24 micro-zones that are shown in Figs. 7a, b, and c, and their dendrograms are plotted in Fig. S3.
The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is a standard test procedure to determine the suitability of using factor analysis (Kaiser, 1970). The  (Table S2), above the generally recommended value of 0.60.
Bartlett's test of sphericity (Bartlett, 1950) is applied for testing the null hypothesis that the variables of the correlation matrix are uncorrelated. The results of Bartlett's test Fig. 6 The long-term mean annual precipitation in ten meso-precipitation zones of Iran. The mean annual precipitation decreased from zone A to J , each zone is highlighted with a particular color: for example, zone F shown with red solid circles, and the long-term mean annual hyetograph  of each zone (pre-cipitation depth in each month of the year is shown with a bar of particular color: for example, red bar shows July precipitation). Note the variability of precipitation magnitude and distribution in each zone of sphericity for the nine meso-zones (B to J) were found significant (p < 0.0001) (Table S2). Table S3 reveals the Eigenvalues, % of the variance, and cumulative % of different factors in the non-rotational and rotational states. The number of components was extracted from the eigenvalue and the total variance of the components. Only those PCs with an eigenvalue greater than 1.0 that together explain more than 65% of the total variances of the data were selected. In zones B, F, H, I, and J, the first two PCs together accounted for 65, 73, 78, 80, and 67 percent of the total variance, respectively. In zones C, D, and E, the first three PCs together accounted for 85, 79, and 92% of the total variance, respectively. In zone G, the first 4 PCs explain 80% of the total variance. Although in zones B and J the total variance is less than 70%, however, their eigenvalues are greater than 1.0. Therefore, PCA distinguishes 24 precipitation micro-zones the same as k-means clustering and HCA.
Score plots of PC1 vs PC2 for zones B to J are presented in Fig. S4 and also shows the division of zones B to J into the same number of sub-zones.
The 24 micro-zones are mapped in Fig. 7a, b, c, and d based on k-means clustering, HCA (Ward's method-Euclidean Distance), HCA (Ward's method-Pearson correlation), and PCA method, respectively. The results of the four methods are closely similar. However, PCA gives more distinct zoning (Fig. 7d). The formation of 24 micro-zones is due to the vast area, complex topography, large latitude variations of the country, and also extensive nearby water bodies. The mean annual hyetographs of the 24 zones are plotted in Fig. 8 elucidating that while the pattern of the long-term mean annual hyetograph of micro-zones in each zone (meso-zone) is the same, however, their monthly precipitation magnitude varies. The mean annual precipitation in each micro-zone is shown in Fig. 9. Micro-zones A and J 2 have the maximum (1293 mm) and minimum (99 mm) mean annual precipitation, respectively. To better differentiate the 24 micro-zones visually, and to calculate the precipitation volume, Fig. 7d is plotted on a 0.25°-gridded map of Iran (Fig. 10). The limiting border of each micro-zone was approximately drawn with the help of the mean annual hyetograph of rainfall stations with a lower number of rainfall records (i.e., less than 33 years) and their similarity with the hyetographs of Fig. 8. Figure 10 provides the most comprehensive map of Iran's precipitation consisting of 24 provinces introduced in this research for the first time.

Annual precipitation volume
The estimated long-term (33 years) mean monthly and annual precipitation in micro-zones is presented in Table 2. Each annual precipitation was multiplied by its corresponding area (2nd column of Table 2), and the annual precipitation water volume was calculated as tabulated in the last column of Table 2. The summation of the last column that equals 406 BCM is the precipitation water volume that Iran receives annually. Note that the area of each micro-zone was calculated from Fig. 10. Since the total land area of the country is 163,2211 km 2 (i.e., summation of the 2nd column of Table 2), the longterm average annual precipitation over the county will be 249 mm which is less than the one-third that of the world (i.e., 800 mm). This value is comparable to the 252 mm, 250 mm, and 260 mm, already reported by Heydarizad et al. (2018), Madani (2014), and Modarres (2006), respectively. The difference is due to the length and time of data used in the determination of the mean annual rainfall.

Conclusions
While with a long-term mean annual precipitation of 250 mm, Iran is known as a semi-arid to arid country and enjoys a very diverse climate. In this paper, we investigated the climate variability by mapping the precipitation diversity over the country, at three macro-, meso-, and micro-scales. Thirty-three years of the daily precipitation at 461 measuring stations are analyzed using CA (both hierarchical and non-hierarchical clustering) and PCA methods. The results demonstrate that at the macroscale, the regime of Iran's precipitation is mapped into six regions. Macro-resolution defines the largest homogeneous regions, each having a long-term mean hyetograph with unique annual distribution and unique monthly Fig. 8 The long-term mean annual hyetographs of the 24 microzones, P 33 is the 33-year mean annual rainfall in each zone. Note that each month of year shown with a unique color ◂ Fig. 9 The long-term mean annual precipitation in the 24 micro-precipitation zones of Iran precipitation magnitude (depth) different from the others. The moist air masses entering the country are the main controlling factor. At a meso-scale, the precipitation regime is further mapped into ten zones. The collision of the Mediterranean and the Sudan air masses and the occasional merging of the Sudan air mass with the Maritime air mass most probably cause the division of the macro-regions into ten zones at meso-resolution. The six regions and the ten zones show the differently distinguished magnitude and annual patterns of precipitation. The six and ten zones are comparable to previous studies. However, considering the length of data, the number of rainfall measuring stations, and methodologies used by previous studies that shortfall statistically compared to our data and methods (Table 1, the last two rows), our classification is more rigorous, comprehensive, and climatologically sound. At the micro-scale, 24 precipitation micro-zones are distinguished and introduced in this research for the first time. The vast area, complex topography and latitude variations of the country, and extensive nearby water bodies control such a precipitation diversity. While the pattern of the mean annual hyetograph of some micro-zones is the same, however, their monthly precipitation magnitude varies. The southeastern provinces of Sistan-Baluchestan (zone J2) with a long-term mean annual precipitation of 99 mm and the southwestern coast of the Caspian Sea (zone A) with 1293 mm, are the driest and wettest zone in the country, respectively. The result provides an accurate insight into the amount of precipitation that one expects to fall in each zone during months of the year, a guideline to manage and allocate the harvested precipitation water to different consumptive sectors, and will advance the understanding of precipitation dynamics in different regions of the country.