Improved teleconnective predictability of monthly precipitation amounts using canonical correlation analysis

This study was aimed at evaluating the application of the canonical correlation analysis (CCA) to predict monthly precipitation amounts (predictands) by benefitting from 17 large-scale climate indices (predictors) in Iran. Monthly precipitation data, covering the period of 1987–2017, were collected from 100 weather stations across the country. Monthly precipitations were predicted using the multiple linear regression (MLR) models, based on the 1- to 6-month lead times of the original and canonical predictors. The cross-validation was conducted to compare the prediction skills of the two sets of MLR models constructed on the basis of the original predictors (MLOrigPr) and the canonical predictors (MLCCAPr). The analyses revealed the dominant teleconnections and that there are the interannual variations in responses of precipitation to them suggesting that a signal only is not sufficient to achieve a robust understanding of the associations. At the 1-month lead time, the MLR models based on the canonical predictors outperformed those based on the original predictors. However, the skill of both models was reduced by increasing the lead times up to 6 months. Averaging on all stations, around 61.4% and 26.3% of the observed values, falls into the cross-validated 95% prediction intervals of the MLCCAPr and MLOrigPr models, respectively. Furthermore, the MLCCAPr models were found to be more spatially universal than the MLOrigPr ones and decrease multicollinearity symptoms strengthening the predictions. These findings corroborated the advantage of using the CCA in improving the teleconnective predictability of precipitation in Iran.


Introduction
Precipitation is one of the most complicated hydroclimatic phenomena (Araghi et al. 2017; because of its ever-increasing uncertainty in the global and regional water cycles (Ghasemi and Khalili 2008). Therefore, successful prediction of precipitation provides substantial information when managing water resources and agriculture systems (Kim et al. 2018;Shuttleworth 2012).
The large-scale pressure anomalies (climate signals/ teleconnections) are recurring modes, controlling the spatiotemporal variability of the precipitation systems happening over a given region during a specific period (Irannezhad et al. 2017). Several studies have investigated the correlation of climate signals and the precipitation variabilities at far distances (e.g., Araghi et al. 2017;Casanueva et al. 2014;Duzenli et al. 2018;Oldenborgh et al. 2000;Yuan et al. 2016) in order to improve our understanding of the current and future conditions of the local/regional bio-hydroclimatic variables (Wise et al. 2015). The possible influences of teleconnections on the seasonal and monthly precipitation amounts in Iran have been reported in several studies. Nazemosadat and Cordery (2000) found that total autumn precipitation over various parts of the country during El Niño episodes was several times more than those during La Niña periods. Ghasemi and Khalili (2008) resulted in that the large-scale signals, such as North Atlantic Oscillations (NAO), control spatial patterns of winter precipitation in parts of Iran. Molavi-Arabshahi et al. (2016) analyzed the possible connections between several large-scale climate indices and precipitation and air temperature on the southwest coast of the Caspian Sea. They found that while NAO influenced only air temperature, the precipitation variations may be connected mostly with ENSO and partially with NAO. Araghi et al. (2017), using a wavelet coherence, assessed the linkages between precipitation and each of three major climatic indices NAO, Arctic Oscillation (AO), and Southern Oscillation Index (SOI). They reported SOI as the most effective climate index that can be related to precipitation in Iran. In a recent analysis, Alizadeh-Choobari et al. (2018) confirmed that the ENSO cycle contributed to the interannual climate variability over Iran.
Most of the studies on teleconnective responses of precipitations have often focused on the precipitation variability and trend and the associated impacts of precipitation on hydrological systems (e.g., Hosseinzadeh Talaee et al. 2012). However, these analyses are useful as they are used to investigate the possible predictability of precipitation over a region at different time scales (Barlow et al. 2016;Rana et al. 2018). Nonetheless, little is known to explore the teleconnective predictability of the precipitation amounts over Iran and how this application can be improved for providing more accurate results. In other words, the linkages between teleconnections patterns and rainfall characteristics in the last decades have been successfully analyzed. However, it is inadequate to account for only a few teleconnection patterns when predicting precipitation, because large-scale patterns do not occur every year, each of which, in turn, impact on the precipitation predictability in some years (Landman and Mason 1999).
The current study was aimed at evaluating the predictability of monthly precipitation amount in Iran in response to several candidate large-scale climate indices, using the multiple linear regression (MLR) models. Many researchers such as Swain et al. (2017) and Sharifi et al. (2019) used MLRs for modeling the precipitation. These models are the most common numerical machine learning methods used in various sciences like hydrology and agriculture representing the relationship between two variables of predictor and predictive (Sanford and Selnick 2013). Applying these methods mostly is because of their simplicity (Khoshravesh et al. 2015). The stepwise algorithm (Draper and Smith 1981) is commonly used to construct MLRs and to decrease input predictors. However, the accuracy of these methods needs to be improved, especially, when these are supposed to be estimators of precipitation. Hence, the contribution of the canonical predictors, derived from the largescale climate indices, in the predictability of precipitation in the framework of MLR models is compared to the original ones. The canonical variables are extracted using the canonical correlation analysis (CCA) in a way that the correlation between precipitation and climate indices is maximized (e.g., Assani and Guerfi 2017;Benestad 2002a, b;Dan et al. 2016;Landman and Mason 1999;Nicholls 1986;Tomozeiu et al. 2014;Wilks 2013). The CCA approach was previously utilized to predict and analyze Amazonia seasonal droughts by Lima and AghaKouchak (2017). It is noteworthy that CCA produces linear combinations while trying to maximize the correlation of two sets (Hotelling 1936) and to minimize the effect of the noises by shrinking their canonical coefficients (Lima and AghaKouchak 2017).

Data and study area
Iran is situated in the Middle East, with climatic diversity ranging from arid to per-humid. Most of the arid and semiarid lands are located in the central to eastern parts of Iran, and the humid lands are located in the northern and western parts of Iran. Monthly precipitation data of 100 synoptic stations for the 31 years period of 1987-2017 were collected from IR of Iran Meteorological Organization. Table 5 in Appendix provides the attributes of the stations used in the study. From July to December of the year 1987, it was used to provide the required information for January 1988. Hence, the 30-year period of 1988-2017 was considered to calibrate and validate the models. These stations were chosen based on the completeness of the dataset and the length of the climate record. The geographical locations of the chosen stations are shown in Fig. 1. This figure also shows the geographical distribution of the aridity index (UNEP 1992) over Iran (Ghamghami and Irannejad 2019) interpolated by the inverse distance weighting (IDW) method, ranging from 0 (hyper arid) to 2.2 (perhumid). Moreover, the monthly standardized time series of seventeen large-scale climate indices (NCAR 2015), as listed and briefly described in Table 1, were employed as predictors to construct the models over the study area. The standardized monthly values of these indices from 1950 onwards are available from the website http:// www. cpc. ncep. noaa. gov/ data/ teled oc/ telec onten ts. shtml. Several climate indices were used in this study because a single climate index is not enough to describe the entire climate system perfectly (Kim et al. 2018). Note that monthly precipitation data were also standardized because the standardization procedure guarantees that all data have identical opportunities to participate in the prediction process (Barnston 1994;Landman and Mason 1999).

Multiple linear regression model
Supposing that P i j is the standardized precipitation amounts at the station i (i = 1:100), for the month j and Y j− is a vector  Atmospheric circulations North-south dipole of anomalies between Kamchatka and southeastern Asia of teleconnection indices at the ( = 1 − 6) months before the month j, the MLR predictive model of precipitation as a function of teleconnections can be expressed as below: where, is the intercept term, is the regression coefficients vector, and is the residual and should follow a zeromean normal distribution. Precipitation variability in a given month is eventually related to a few indices, but certainly not to all indices. The regressive coefficients of indices in a MLR model should therefore differ by month of the year (Asong et al. 2016). Equation 1 was then applied to the data for each station at each month of the year. The stepwise approach was used for selecting from several candidate predictors affecting the accuracy of the estimated predictands (Draper and Smith 1981). This approach also reduces biases arising from over-fitting, the issue that generally leads to very good results at the calibration stage and very poor results at the validation stage.

Application of canonical correlation analysis
This research used CCA to achieve a highly correlated teleconnective response of predictands to predictors, as CCA does not keep incoherent variabilities that disrupt correlations induced by local forcings, such as noises and outliers. This advantage is specifically perceived when evaluating the cross-validated skills of the statistical models. Furthermore, CCA defines the objective patterns as being highly correlated to both predictand and predictors datasets (Rana et al. 2018).
CCA is a multivariate statistical method that computes linear combinations existing between two sets of variables (Barnston and Ropelewski 1992) and finds the most highly correlated patterns (or modes). In fact, CCA finds two sets of the canonical coefficients' vectors (V and U) such that the correlations between the pairs of projected predictands and predictors are maximized (Lima and AghaKouchak 2017). Canonical variates are then obtained by multiplying the original variables vector ( Y ) by the corresponding canonical coefficients vector ( U ). The correlation between the pairs of canonical variates is referred to the canonical correlation. More details on CCA can be found in Barnett and Preisendorfer (1987), Yu et al. (1997), and Marzban et al. (2014). Considering the canonical variates, Eq. 1 can be modified as Note that the MLR model in Eq. 2 relates the precipitation amounts (as predictands) to the canonical variates calculated from teleconnection indices (as predictors). Application of CCA to the original predictor variables likely guarantees (1) the improved predictability in most cases (Lima and Agha-Kouchak 2017). Predictions using Eqs. 1 and 2 are denoted by MLOrigPr and MLCCAPr, respectively, in the following.

Model evaluation and metrics
A well-fitting regression model should produce predicted values close to the observed data values. In this study, both models (MLOrigPr and MLCCAPr) are calibrated for three 20-year periods (RUN1:199820-year periods (RUN1: -2017RUN2: 1988RUN2: -1997RUN2: plus 2008RUN2: -2017RUN3: 1988RUN3: -2007 and, correspondingly, are cross-validated for three 10-year periods (1988-1997; 1998-2007; 2008-2017) for each station at each month of the year. The model cross-validation helps to account for uncertainty resulting from unrepeatable observations (Asong et al. 2016) so that the projected results are indeed more reliable than those of individual validations. Different criteria were used to compare models, i.e., the adjusted coefficient of determination (CD or R 2 ), the root mean square error (RMSE), and the number of stations with invalid regression (NIR). In the context of the latter, three below conditions produce invalid regressions: (1) the residuals do not follow a normal distribution according to a goodness-of-fit test such as Shapiro-Wilk test (Shapiro and Wilk 1965); (2) none of the regressive coefficients are statistically significant at the 5% significance level; and (3) the adjusted CD of a model is not statistically significant at the 5% significance level. Based on these conditions, the MLR models that do not satisfy at least one of the three above-mentioned conditions are considered as the invalid models. Furthermore, variation inflation factor (VIF) was measured to analyze the effects of multicollinearity as an event that can occur when running a multiple regression model (Chan et al. 2022). Multicollinearity takes place when independent variables in a regression model are severely correlated. Often a VIF value less than 10 represents a weak multicollinearity, but one greater than 10 indicates a robust multicollinearity (Weisberg 2005). It is expected that multivariate methods like the CCA reduce symptoms of this event.

Simultaneous cross-correlations
The simultaneous (Pearson's) correlation coefficients between precipitation and each of the seventeen teleconnections were calculated and are shown as the box and whisker plots (Hubert and Vandervieren 2008) in Fig. 2, for each month of the year. Each box-whisker plot in the figure involves hundred correlation values corresponding to association of each of hundred chosen stations with one teleconnection. Each box involves the 50% limit of correlation values, with the upper and lower blue dash lines indicating the 95% confidence interval of correlation coefficients across the stations of interest. Although the correlations include both negative and positive values, some teleconnections have correlation values that their distribution is negatively/ positively skewed. As shown in the figure, for all months of the year, some teleconnection indices have significant associations with monthly precipitation in a number of stations over Iran. The linkage between the climate of the Middle East and large-scale climate patterns has been reported by many researchers (e.g., Dezfuli et al. 2010;Hosseinzadeh Talaee et al. 2012;Hurrell 1996;Marcella and Eltahir 2008;Roghani et al. 2016;Sabziparvar et al. 2011;Setoodeh et al. 2004;Türkeş and Erlat 2003). Precipitations of January were negatively associated with EA and PNA indices over most of the stations, but only a few stations had significant correlations. EA pattern also influenced precipitations of February higher than January, as the number of statistically significant correlations increased. In addition, precipitations of February in several stations happened during the negative phase of SOI pattern. Moreover, ENSO was found to be more positively correlated with precipitation in February than in January. The teleconnections with March precipitation were stronger than with the other two months of winter, with nine out of seventeen indices significantly correlated with precipitations in March. In several stations, March precipitations often occurred in the negative phases of AMO, EA/WR, PNA, and SOI, as well as in the positive phases of ENSO, EP/NP, and PDO. As shown in the figure, the strongest teleconnection pattern influencing March precipitation was SOI, consistent with Nazemosadat and Cordery (2000), which reported a negative association between SOI and precipitation occurrence in most parts of Iran.
More than 40% of the studied stations indicated the significant negative correlations (p value < 0.05) between the April precipitation and the EA pattern. More climate indices were correlated with April as well as May and June precipitation relative to the winter months' precipitation. Compared to the winter and the spring, relatively unchanging patterns were observed for the summer months (July to September). However, it was rarely seen that the highrain stations were associated with teleconnection indices in summer. It should be noted that the ENSO/SOI patterns were negatively/positively correlated, albeit moderately, with August precipitation, which is exceptional. Precipitation occurrence in Iran has coincided to some extent with the positive/negative phases of ENSO/SOI patterns (Fig. 2). For example, autumn precipitation (October to December) was positively/negatively correlated with the ENSO/SOI patterns, some of which are statistically significant. The November precipitations have occurred mainly in the positive phases of ten indices (AMO, DMI, EA, EA/ WR, NAO, NP, and ENSO patterns) and in the negative phases of four indices (AMO, EP/NP, PNA, and SOI). The December precipitations were significantly associated with the SOI negative and the EA/WR positive phases. In line with Roghani et al. (2016), the autumn precipitation was negatively associated with the SOI. Overall, it was concluded that the autumn precipitation was most strongly affected by teleconnections. However, all high rain months can be related to positive or negative phases of large-scale patterns that help researchers to model precipitation.
The spatial distribution of the influences of teleconnection indices on monthly precipitation was also examined. Maps in Fig. 3 provide these results for precipitations of December associated with two strong signals, i.e., EA/WR and SOI. These maps were obtained by interpolating correlations. As mentioned above, precipitations of December have occurred at the positive and the negative phases of EA/ WR and SOI, respectively. Correlations are statistically significant in more than 50% of stations. However, the spatial distribution of significant correlations is discernibly different. Spatially, EA/WR was found to be the dominating pattern over the west of Iran, while SOI showed dominance in parts of the south, the east, and rarely the northwest. In other words, the EA/WR pattern was the most influential index for variability of precipitation over the more humid regions, while the SOI pattern had impacts mostly over the more dry regions. It revealed that each index influencing precipitation is responsible for variability in specific regions. Similar to these findings were found for other months that are consistent with Sabziparvar et al. (2011) that reported arid and humid regions to respond to teleconnections differently.
Impacts of some indices were very site-specific controlling precipitations of several adjacent stations. For example, though precipitations of December in stations located in the southeast of the country (such as Chabahar, Zabol, Zahedan) were not significantly correlated with two above indices, they were strongly associated with DMI index with correlations ranging from 0.408 to 0.585. Furthermore, Nazemosadat and Mousavi (2003) and Hosseinzadeh Talaee et al. (2012) reported that the linkage between precipitations in Iran and NAO pattern is generally weaker than corresponding associations with SOI. In the current study, these results were also found in most of the months, but in some months such as January, NAO pattern was more strongly associated with precipitation than SOI.

Time-lagged cross-correlations
An analysis was also conducted on time-lagged relationships to investigate the possibility of precipitation predicting by predictors obtained from the previous months. Overall, by evaluating all timelagged correlations of 1 to 6 months it was found that linkages were weakened when increasing time lags. For instance, Fig. 4 provides box and whisker plots of correlations between precipitations of December and teleconnection indices of November (time lag of 1-month), October (time lag of 2-month), September (time lag of 3-month), and August (time lag of 4-month). As shown, two strong signals that influence simultaneously precipitation of December were considerably attenuated. However, some signals have no the same behavior. For example, AMO index at all-time lags was negatively associated with precipitation of December similar to simultaneous influences. Rarely, the influences of some signals on precipitations were intensified. For example, simultaneous correlations between EP/NP index and precipitations of December were found to be weak, but these correlations were amplified at time lags of 2 and 4 months. Furthermore, there  are changes of tendencies in some cases whereby correlation signs changes from + to − and vice versa. According to Fig. 2, precipitation of December was associated with the positive phase of PNA in the same month while with the negative phase of PNA at time lags of 2 and 4 months. Ghasemi and Khalili (2006) reported that there is a lag between the effect of the Atlantic system and Iranian climate. Findings of the current study confirm this report for AMO but not for NAO pattern, as the Atlantic oscillations. Overall, results of both relationship analyses, i.e., simultaneous and time-lagged, suggest that a single predictor for each of the stations is not adequate for constructing predicting models, because one of the teleconnection indices only cannot describe the entire climate system perfectly (Kim et al. 2018). Table 2 reports summary statistics of the canonical variates extracted from applying the CCA on a high-dimensional matrix including the standardized monthly precipitations of 100 stations (matrix v) and the standardized monthly data of 17 teleconnection indices (matrix w). Wilks' lambda test (Rao 1951) was served to investigate whether canonical correlations between the canonical variates (V and W) are significant or not. In other words, Wilks' lambda test allows testing whether the canonical variates are significantly linked to the input variables or not. It can be seen from Table 2 that the canonical variates F1 to F6 are strongly linked to the original variables. While F7 is somehow linked although not significantly, F8 to F17 poorly related to the input variables. The first three factors are statistically significant at the 99% confidence level, while the second three factors are statistically significant at the 95% confidence level.

Canonical analysis
The Eigen-values in Table 2 show the percentage of variability explained by each canonical variate. For instance, F1 alone explains more than 11% of the variability. As seen, more than 50% of the variability is explained by the first six factors F1-F6. The last column indicates the correlation between the two sets of canonical variates; for example, the first canonical predictor and predictand are closely correlated with approximately 0.86.
The correlations between original variables and canonical variates called canonical factor loadings, allow understanding of how the canonical variates are related to the original variables (Marzban et al. 2014;Rana et al. 2018). In the following, these loadings are interpreted to recognize the physical mechanisms behind maximally correlated the canonical variate pairs. Figure 5 provides the loadings for the first six modes for which the canonical correlations were statistically significant. Each mode explains precipitation variabilities relevant to portion of the country. The maps show the spatial distributions of the predictand loadings demonstrating correlations between the stations' precipitations (v) and their corresponding canonical variates (V). The red and blue circles (Fig. 5) indicate the negative and positive correlations, respectively. The bigger the circles indicate the higher the loadings in Fig. 5. In addition, bar graphs of the predictor loadings, demonstrating correlations between teleconnection indices (w) and their corresponding canonical variates (W), are provided at the bottom of the maps. Significant positive/ negative loadings are also distinguished by the blue/red bars.
Overall, the CCA successfully isolated patterns of spatial linear variability between the two canonical variates. Mode 1 highlights the relative importance of the three teleconnection indices, i.e., AMO, DMI, and SUNSPOT, for explaining precipitations variabilities over Iran. Also, mode 1 exhibits negative loadings in the overwhelming majority of the stations. This mode indicates that precipitations at some stations located in the northwest and the south of Iran significantly occurred during the negative phases of AMO and DMI and during the positive phase of SUNSPOT. Though this mode did not produce a coherent spatial pattern, it explains more than 11% of the variability in stations' precipitation (Table 2). Mode 2 indicates highly positive loadings in the eastern half of the country, accompanied by marginally negative loadings along the southern coast of the Caspian Sea. This mode also features that precipitations at all stations located in the eastern half of Iran occurred substantially in the negative phases of AMO and SOI and in the positive phase of NINO12. Mode 3 reveals strongly positive loadings in most of the stations located in the northwest of the country, suggesting that the positive (negative) phases of PDO (SUNSPOT) influence precipitation. Only a limited number of stations, which are sporadically scattered in the eastern half of the country, show a significant negative correlation. Unlike modes 2 and 3 which produce coherent spatial patterns, mode 4 features a spatially scattered pattern with two distinct areas of significant loadings located on the coast of the Caspian Sea and the southwest regions. Additionally, a limited number of stations sporadically distributed in the eastern half of the country show negative correlations but weaker than two latter regions. This mode highlights the influence of ENSO, AMO, and EA/WR patterns on precipitation over the country. Mode 5 exhibits strong positive loadings in most of the stations located in the west of the country. This mode features that precipitations occurred meaningfully in the positive phases of ENSO patterns and in the negative phase of the SOI. Finally, mode 6 is particularly associated with stations located on the southwest coast of the Caspian Sea, indicating that three indices AMO (negative phase), EA/WR (positive phase), and PDO (negative phase), influence precipitation in this region, unlike Molavi-Arabshahi et al. (2016), which found that NAO and ENSO had a major impact on precipitation in this region during 1956-2010; this study found that these signals had negligible effects on precipitation in the period 1988-2017. As seen, there were at least four relatively coherent patterns governing the eastern half, the northwest, and the western half of the country and the southwest coast of the Caspian Sea. It is revealed that the strong loadings in the first six modes appear to mostly cover most of the rainy stations as influenced by the large-scale oscillations. However, some teleconnection indices, however, were not significantly correlated with the modes discussed, but with the other modes. For instance, the AO index was significantly correlated with mode 12 (F12) as high as − 0.75, and this mode was also correlated with precipitation in a few stations in the northeast of the country. Similarly, the NP index was significantly correlated with mode 17 (F17) as high as − 0.9, which was also associated with precipitation in many stations in the southeast of the country. To account for such strong signals affecting the precipitation of the stations, some of which were ignored in the first six modes; all seventeen canonical predictors were potentially used to construct the models.

Models' performance: cross-validated coefficients of determination
In the previous subsections, the teleconnective predictability of monthly precipitations over Iran was discussed. Capability of the canonical variables for improving teleconnective predictions of precipitation is compared to the original variables using the two groups of models MLCCAPr and MLOrigPr, respectively, in this subsection. Figure 6 shows the overall performance of the both mentioned models in various months for the lead times of 1 to 6 months. The panels of this figure indicate the adjusted CD (ACD) averaged over all three calibration periods and all the stations of interest. The dash lines in this figure indicate the 5% significance level of ACD. As expected, performance drops with increasing lead time for both the models tested and for all months. The model's performance is higher for high-rain than low-rain months. For short-term lead times, especially for 1 month, the MLCCAPr models have considerably higher skills than the MLOrigPr models in all months of the year. The average ACD ranges from 0.4 to more than 0.7 for MLCCAPr models, while the ACD for MLOrigPr models does not exceed 0.45. This improvement is also seen for the 2-and 3-month lead times, but it is less than 1 month in terms of the ACD value. This suggests that variability of precipitation could be related to the canonical rather than the original predictors. In longer lead times from 4 to 6 months, the ACD skills for the canonical-based models are close to those for the original-based models; therefore, the MLCCAPr models could not add any further skill to what obtained by the MLOrigPr models. The relative advantages of the MLCCAPr models are better observed in Fig. 7, which shows box and whisker plots of the adjusted CD as averaged over different stations for the  two models, for three runs, and for all months. In general, it is evident from this figure that the adjusted CD associated with the MLCCAPr models for the winter, autumn, and spring were satisfactorily greater than those for the MLO-rigPr models. In the case of the summer, canonical predictors could contribute a bit to the skill of the MLR models. The explanation for this drawback is attributable to the fact that the summer experienced low precipitation over Iran and most of the stations do not receive nearly any precipitation amount. However, considering that much of the precipitations over Iran occurred mostly in the other seasons, poor skills for the summer is not a major concern. The proposed model seems to have the greatest prediction skill in rainy months.

Models' performance: validity of regressions
The rationale for adding canonical rather than original predictors to MLR models was also assessed by the criterion of NIR (number of stations with invalid regression) for both models. The panels of Fig. 8 indicate the NIR averaged over all stations and calibration periods. As shown in the figure, the average of NIR increases for all models tested and for all months as the lead time increases. Again, the MLCCAPr models have a better skill compared to the MLOrigPr ones. The largest improvement is seen for the 1-month lead time. This implies that the MLCCAPr models could yield more accurate regressions than the MLOrigPr models. The MLC-CAPr and MLOrigPr models, for example, were able to predict December precipitations of at 98 and 64 stations, respectively, at the 1-month lead time. In fact, the CCA helped to improve the performance of MLR models over various stations. Monthly mean NIR (black dash-line) were obtained 13,18,23,31,37,and 43 stations for MLCCAPr models,and 42,47,53,60,68, and 78 stations for MLOrigPr models at lead times of 1 to 6 months, respectively. It appears that the highest monthly mean of NIR for MLCCAPr models (at lead time of 6 months) is nearly as high as the lowest for MLOrigPr models (obtained at lead time of 1-month). Nevertheless, at longer lead times, the skill of the proposed model is also reduced, enabling the MLCCAPr models to be adopted as short-term predicting models. Similarly, Landman and Mason (1999) and Lima and AghaKouchak (2017) found that with increasing lead-time the inherent predictability of precipitations over South Africa was decreased.

Models' performance: root mean square error
The capability of the proposed model to predict monthly precipitation is also shown in Fig. 9 for a lead time of 1 month. The spatial distributions of the cross-validated root mean square error (RMSE) values are seen in the bubble maps of this figure for two relatively high-(February) and low-(May) rainy months, for both models and for three runs. The red circles indicate the RMSE values, and again, the larger the circles, the higher the RMSE values. In each bubble map, the stations where the MLRs were invalid are shown with blue + . From these maps, it can be inferred that (I) in the context of RMSE, the MLCCAPr models also have higher skills compared to the MLOrigPr models; (II) for both models, the highest NIR is observed in relatively low-rainy months (here May); and (III) in accordance with Fig. 8, the CCA made the MLR models more spatially universal than the MLOrigPr, as the MLCCAPr models were able to predict monthly precipitations in most of the stations. As shown in the figure, in May, especially for RUN2 (midcalibration), the original predictors have lost the skill of the MLR models in many stations.

Reduced multicollinearity
Tables 3 and 4 provide the VIF values obtained from the multicollinearity test among the original and CCA predictors, respectively. This test was done for each month separately and is independent of predictand variables. White cells in these tables have values for which multicollinearity symptoms have not been observed. Gray cells with bolded values and red cells with italic values represent a severe (10 < VIF < 20) and very severe (VIF > 20) multicollinearity, respectively, meaning that models containing those will have poor generalization ability and overfit the data (Chan et al. 2022). Though the stepwise method was used for the selection of teleconnection variables affecting rainfall prediction, according to Table 3, at least 30% of predictors suffer from this event. Clearly, predictors with VIF higher than 20 are often related to the Nino phenomena. CCA reduces these symptoms to 18%, and what is most important, very severe multicollinearity was seen for three cells only. These findings confirmed that the CCA application improved multicollinearity symptoms and made predictions more robust.

Responsible teleconnections
To better understand the physical mechanisms responsible for improving the predictability, the contributions of the canonical and original predictors in prediction of precipitation were investigated. Figure 10 reveals teleconnections through which precipitation is predictable, for example, in March. This figure displays contributions for all stations and for all lead times in four geographical divisions, i.e., northwest (NW), southwest (SW), northeast (NE), and southeast (SE). Black points and red circles are related to teleconnections involved in development of the MLCCAPr and MLOrigPr models, respectively. As seen in the figure, in the vast majority of the stations, far more teleconnection indices were effective in predicting precipitation by MLC-CAPr than MLOrigPr models. As a clear result, the roles of ENSO patterns have been emphasized. Ahmadi et al. (2019) stated that the average annual rainfall over the country has no significant difference between the negative and positive phases of some indices like AO. Here, this pattern was also found to have a neutral effect on monthly precipitation, with the exception of June and November.

Cross-validated time series for various climates
Many applications, such as water resources management, require prediction of precipitation over monthly or seasonal time scales. Furthermore, the relatively suitable prediction of extremes is particularly imperative as it indicates that the model is somehow preserving the predictor-predictand relationships (Asong et al. 2016). This feature will be an advantage, especially when models face unrepeated predictors. Figure 11 shows the time series of cross-validated seasonal precipitation values predicted by the proposed approach in this study (MLC-CAPr) and the corresponding observed values. These findings are related to the lead time of 1 month which provides more accurate results than other lead times. To consider uncertainty, the calibrated models are assessed by stochastically generating 100 realizations of monthly precipitation for three validation periods, i.e., 1988-1997, 1998-2007, and 2008-2017, which span the entire study period. The 95% confidence ranges are then derived as prediction intervals, and their means were considered as finally predicted values. These findings were shown for six stations that were randomly chosen among climatologically homogeneous stations based on the De Martonne classification (see Table 5 in Appendix). These stations are Sabzevar (arid), Tabriz (semiarid), Ilam (Mediterranean), Gharakhil

Conclusion
This study attempted to improve the ability of the MLR models to predict monthly precipitations over Iran at 100 synoptic stations. The main objective of this study was to investigate, along with correlation analyses performed, whether or not the use of canonical predictors in an MLR model predicting monthly precipitations contributes to an increase in its skill. From the findings presented, the following conclusions can be drawn: 1. This study concludes that there is a spatiotemporal interannual variability in large-scale patterns that affects the

Ilam -Mediterranean climate
precipitation of Iran. Such that impacts of some indices like AMO, DMI, and EA on some months of the year were found to be stronger than typical indices (SOI and ENSO) used in previous studies. While Araghi et al. (2017) and Alizadeh-Choobari et al. (2018) have respectively introduced SOI and ENSO patterns as the strongest signals, this represents the fact that a single signal is not sufficient to achieve a robust estimation of the associations. 2. The CCA presented at least four spatially coherent and relatively un-overlapped precipitation patterns associated with teleconnections patterns covering most regions of the country.
3. The influence of some indices like the SUNSPOT on regional precipitations was highlighted by the CCA, which was also not noticed in previous studies. 4. The incorporation of canonical predictors in an MLR model contributes to an improvement in its skill, as approximately 61.4% and 26.3% of the observed values fall within the cross-validated prediction intervals produced by the MLCCAPr and MLOrigPr models, respectively.
For future studies, the MLR models improved in this study can be given out to predict monthly/seasonal precipitations while also considering the impact of climate change.   Data availability Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

Conflict of interest
The author(s) declare that they have no competing interests.