Segmental Variability of Precipitation In The Mahanadi River Basin During 1901-2017

Eigen-based sequential spatial pattern analysis an application of PCA is presented here. The analysis examines the spatial distribution of precipitation over the Mahanadi river basin. The Spatial(S)-mode of sequential spatial pattern analysis with the application of Maximum loading value of rotated retained principal component (also referred to as Maximum loading value approach) to assess a gridded monthly rainfall data of resolution 0.25° x 0.25° having a record length of 117 years (1901-2017). The meteorological records have a sequential spatial eld for Spatial and Temporal-mode, which can be used for recognizing the area for precipitation variability and regime. The identied patterns of the different timeslot segments were then analyzed for their dispersions of the annual precipitation observed at different station points using similarities and dissimilar characteristics of inter-cluster and between clusters respectively. Validation of the regionalized pattern for distinctness and a pairwise comparison of CDFs using the Kolmogorov-Smirnov ‘D’ statistic test. suggests only 5 station points (including GP_9, 10, 16, 69, 183) above the threshold limits (horizontal dash line) in the correlogram plot. Thus, the data may be considered to be time-dependent. The Spatial independence (Moran`s I test) results indicate a statistically signicant p-value (2.2e-16) with a positive z-score (23.609) and the Moran’s Index (0.867) which is positive. The above statistics suggest that the dataset tends to cluster spatially, in simple words, ‘high-value cluster near other high value’ and visa-versa considering the dataset to be spatially independent. Methodology, R.T.S M.K.V; Writing-original draft, R.T.S; Writing-editing, R.T.S, and M.K.V.


Introduction
The undisputed fact of warming climate against the precipitation variability over time and an IPCC's 5th assessment report suggesting 0.85°C rise during 1880-2012 [IPCC, 2013]. To ascertain this phenomenon of precipitation variability in the Mahanadi river basin, a segmental approach was employed. The monthly precipitation record length was segmented into 3layers of 39 years each, including the 1st layer , 2nd layer , and 3rd layer . In a general hydrological cycle process, water evaporates and is stored in the atmosphere as vapour. The amount of water vapour increases as the temperature increases and, ultimately, the spatial and temporal form of the precipitation pattern changes. "The rst law of geography states that closer spatial entities are more strongly related to each other than distant ones" [24].
So spatial events therefore need to be spatially clustered for which spatial analysis is an important technique [23]. Such analysis is important and works by dividing the data point/object into groups or clusters based on similarities with respect to attributes and location aspects. Regional and global precipitation patterns are likely to change in spatial and temporal terms once the atmospheric water retention capacity is exceeded. The warming climate has intensi ed precipitation characteristics such as precipitation intensities and the number of wet days. Changes in the intensity, quantity, and pattern of precipitation can result in more frequent extreme events, such as droughts and oods. The stakeholders how are ultimately affected are the people for whom the extreme events are unforeseen incidents. Thus, analysis of historical data is very important for extracting information related to changing precipitation patterns.
The study is motivated by various phenological changes in precipitation (e.g., sudden, uncertain, increase, or decrease). The uncertainty and inconsistencies associated with the precipitation adversely affect the stakeholders of different sectors, dependent farmers, health and safety (not limited to living creatures but also the environment). The study aspect is of considerable practical importance for water resource management planning, ood and drought risk assessment. Therefore, precipitation is very important and its variability needs to be studied. Understanding the impact of precipitation variability on water resources is a very challenging and profound task. Consequently, the objective of this article is to investigate spatial-temporal change and regional analysis of precipitation using gridded monthly precipitation in the Mahanadi River Basin.
Aher et al. [1] studied precipitation variability and spatio-temporal analysis in the upper Godavari River basin using daily precipitation data (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) which was then correlated with satellite-derived TRMM data and found a positive correlation between TRMM and rainfall data over three half-decades with microlevel precipitation variability. Huang et al. [5] studied the upper portions of the Hongshui River basin (UHRB) to determine the spatial and temporal variability of precipitation using three precipitation indices (PCI, PCD, and PCP) by analyzing daily precipitation data  and found high PCI in the northeast of UHRB and low PCI in the southwest of UHRB. PCP reveals early rainfall in the eastern UHRB and PCD indicates more dispersed rainfall in the western UHRB. Yin et al. [25] have studied the Huai River basin for frequency analysis of extreme 5-day and 10-day precipitation and spatio-temporal changes using EOF and hydrological frequency analysis by analyzing daily precipitation data (1960-2014) and found a weak upward trend for the two extreme events and are closely associated with drought/ ooding. Yuan et al. [26] investigated the spatial and temporal variation of seasonal and annual precipitation in the Poyang Lake basin using statistical techniques and continuous wavelet transform by analyzing monthly precipitation data . Tayeb Raziei [15] has studied Iran for regionalizing precipitation time variability and precipitation regime using precipitation data of 155 synoptic stations during 1990-2014, with the application of S-mode PCA and Hierarchical cluster analysis preceded by T-mode PCA.
Many researchers have practiced their research on the Mahanadi River Basin by employing different regression and classi cation methods on various observational or gridded data records. Singh et al. [22] investigated the middle Mahanadi river basin and discovered signi cant spatiotemporal variation, including seasonal shifts, increasing dry spells, and extreme weather events, to better understand changes in precipitation characteristics and patterns on a regional scale agro-climatic zone. Sahu et al. [19] have studied the Mahanadi river basin to explore the combined and relative effect of "Indian monsoon and Indo-paci c climatic modes" on the ood frequency analysis and high stream ow events and found that ENSO and ENSO Modoki were signi cant during JJA and SON on river extremities. Sahu et al. [16] have studied the Mahanadi river basin to explore the effectiveness of DBSCAN for precipitation regionalization and found good agreement in clusters, but underestimating the homogeneity characteristics, the ndings were assessed using a popular unsupervised hierarchical cluster analysis.
In this study, a data record length is segmented and is then applied to a sequential spatial pattern analysisbased PCA approach, which is also referred to as the Eigen-based technique. A dataset of 230 grid station points with a record length of 117 years (1901-2017) is segmented into three layers, each of which spans 39 years: the rst segment layer spans 1901-1939, the second segment layer spans 1940-1978, and the third segment layer spans 1979-2017. A pattern delineation using Spatial(S)-mode of principal component analysis for extracting information about different patterns of time variability. Here in Spatial(S)-mode, the magnitude of monthly precipitation is considered as observations and station points as spatial k eld variable. The resulting strong spatial pattern indicates high component loadings among interconnected stations and loadings congruent to zero are associated with dispersed stations. The Mahanadi River basin has very vast climatological characteristics for which the segmental approach of spatial (S)-mode will portray some different precipitation patterns and unveil different aspects of the Mahanadi's precipitation regimes.

Study Area And Data
The Mahanadi is a major east-owing river in India (see Figure-1), which is east-centrally located having a geographical coordinate of 19°20'N to 23°35'N latitude and 80°30' E to 86°50'E longitude. The thalweg is 858 kilometers long from origin to false point, with a catchment area of 141,600 km2. The river was known as Kanaknandani during the ancient era and Mahanadi during the Mahabharat era. The river originates from Pharsiya village, Sihawa town in Dhamtari district of Chhattisgarh and merging into the delta region of the Bay of Bengal, located at Jagatsinghpur, Odisha.
High resolution of 0.25° x 0.25° gridded rainfall data obtained from the Indian Meteorological Department, Pune. The data is prepared using records of 2140 selected stations from 6329 available stations in India based on interpolation schemes [13,14,21]. A pre-analysis data check for stationarity, independence, and spatial independence was performed to assess the data quality and accuracy. A blind assumption of the above pre-analysis data checks may mislead the analysis and conclusion of the test results. The stationarity of the dataset was made using a non-parametric Mann-Kendall trend test [7,9], lag-1 to lag-5 autocorrelation coe cient for independence check and Moran`s I test [10] for spatial independence. The MK test result indicates 200 amongst 230 station points as statistically insigni cant and only 13% (30 station points) lying below the 5% con dence interval. Since most units of observation have no signi cant trends, it is therefore reliable to infer that there is no signi cance at the regional scale and that the data are stationary (readers can also refer to Sahu et al. [20] for wavelet synopsis-based trend detection). The Independence test (Autocorrelation coe cient of lag-1 to lag-5) suggests only 5 station points (including GP_9, 10, 16, 69, 183) above the threshold limits (horizontal dash line) in the correlogram plot. Thus, the data may be considered to be time-dependent. The Spatial independence (Moran`s I test) results indicate a statistically signi cant p-value (2.2e-16) with a positive z-score (23.609) and the Moran's Index (0.867) which is positive. The above statistics suggest that the dataset tends to cluster spatially, in simple words, 'high-value cluster near other high value' and visa-versa considering the dataset to be spatially independent.

Methodology
A Sequential spatial pattern analysis based on PCA is also referred to as the Eigen-based technique [12]. The Spatial(S)-mode and Temporal(T)-mode were respectively used to analyze the patterns of precipitation time variability and the patterns of precipitation regimes. The study is carried out in two-part (1) Segmental analysis, which is completely based on eigenvalue decomposition of the time series data for different timeslots. (2) Pattern delineation which is based on the maximum loading value approach. The identi ed patterns of the different timeslot segments were then analyzed for their dispersions of the annual precipitation observed at different station points using similarities and dissimilar characteristics of intercluster and between clusters respectively. Validation of the regionalized pattern for distinctness and a pairwise comparison of CDFs using the Kolmogorov-Smirnov 'D' statistic test [3].

Segmental Analysis: patterns of precipitation time variability and regionalization
Data preparation before applying sequential spatial pattern analysis (SSPA), which is a prerequisite for principal component analysis. (1) Data segmentation. The rst task of data segmentation is to view the changes in the three-timeslot layers. This perspective view of analyzing precipitation can help us to justify the impact of a warming climate on the patterns of precipitation. (2) Data normalization, the second task of data normalization for which the objective is set to dimensionalize the dataset and to stabilize the tail of the distribution (close to normal distribution) by transforming the dataset to low skewed properties. Transforming dataset to low skewed characteristics by subjecting them to the log, the square root, or the cubic root transformation. The cubic root transformation results in the lowest skewed value. Precipitation time variability patterns across the basin for which a cubic root transformed monthly precipitation time series data of matrix (M x N), where M = 230 represents station points and N = 468 (39 x 12 years) denotes the record length (observations) were implemented to Spatial(S)-mode of PCA. The Kaiser-Mayer-Olkin (KMO) [6] measure for sampling adequacy of cubic root transformed data, the KMO value for all three time-series slots appears 0.99, implying 'marvellous' for principal component analysis implementation. In PCA implementation, an eigenvector is assigned to all the components and eigendecomposition results in as many components as variables in the dataset. An application of an adjacent tool or analysis to retain or decide on the optimum number of components for further assessment, such as the scree plot [2] in conjunction with parallel analysis [4] and accomplishing the estimated eigenvalues with the use of North`s thumb rule criteria [11].
A graphical representation of eigenvalue to that of the eigenvector is known as a scree plot in terms of PCA, whose optimism lies in its elbow. The scree plot's main disadvantage is its interpretation, which is intuitive and arbitrary, implying only a few components (say two or three). A sole criterion for deciding the optimum number of components is not feasible because eigenvalues are subject to bias and errors, and a blind assumption may mislead the test results. A Parallel analysis, which is a supplement to the scree plot for better interpretation and accounting for the feasibility of eigenvalue by adjusting the bias and errors. The rationale of parallel analysis is to gure out that factor accounting for maximum variance, whose probability existence is not expected by chance. North's thumb rule criteria state "if the sampling error of an eigenvalue has a magnitude comparable to or larger than the linear spacing to its closest eigenvalue, then these two eigenvectors are mixed", In other words, it is a function of the degree of separation between the eigenvalues.
The below equation suggests Lawlay's [8] formula for estimating the standard error. (1) Where, n* = effective sample size, λ i = eigenvalue for i th order.
Equation -1 suggests that the error associated with each eigenvalue depends on the eigenvalue itself and the effective sample size, which means the error is directly proportional to the eigenvalue and inversely proportional to effective sample sizes. Also, the error limit at 95% con dence interval associated with each eigenvalue can be de ned as Error = λ i ± 1.96Δλ i

Pattern delineation (maximum loading value approach)
The orthogonal rotation of the retained components, obtained after implementing the cubic root transformed dataset to a sequential spatial pattern analysis with the use of varimax rotation criteria. The objective is to extract more localized information bene cial for pattern interpretation. For analyzing the spatio-temporal variability and the effect of a warming climate, components (accounting maximum variance) from each of the timeslot's segments is considered. The variability or shifting of patterns can be visualized by plotting the Readers can also refer to Sahu et al. [16] for DBSCAN based regionalization and other statistics for spatial and temporal variations for the Mahanadi basin. The identi ed sub-regions can also be studied for regional frequency analysis, which is a supplementary content excluded in this study, refer Sahu et al. [17] for lmoment based regional frequency analysis.
Temporal(T)-mode of Sequential spatial pattern analysis to identify the patterns with different precipitation regimes. The objective of this approach is to identify those station points whose characteristics of annual precipitation regardless of the magnitudes in each station point are similar. To ascertain the objective of this approach, the monthly precipitation data was changed to its relative form, i.e., monthly relative precipitation. Then a Temporal-mode of SSPA was applied to the (N x M) matrix of a relative monthly dataset. Here M = 230 refers to a number of station points and N = 12 refers to monthly relative precipitation. The phenomenon used in this approach is only a kind of normalization to ascertain the over-weighting concerning high precipitation values to the station points. The varimax rotation of the principal component scores was only used for extracting the localized information for interpreting the spatial patterns of monthly relative precipitation.

Statistics of Spatial (S)-mode PCA implementation
Implementation of the cubic root transformed data to Spatial(S)-mode of principal component analysis, for which the rst 5-component whose eigenvalue > 1 are shown in Table-1. The summary statistics of PCA implementation suggest that the rst un-rotated component accounts for 90.20% of segment-1, 89.0% of segment-2, and 87.80% of segment-3 (see Figure-3). The Scree plot in association with parallel analysis, which is subjectively used for assessing the standard error (sampling error) of the resulting eigenvalue from implementation and suggests the optimal number of components to be retained for further analysis (see  Table-2, suggest 42.49% for the rst varimax-rotated loading patterns of a segment- 1 (1901-1939) explaining the southeast and some parts of northwest Mahanadi, whereas 18.16% for the second patterns explaining southeast Mahanadi with high positive loading.
The third pattern accounts for 34.24%, featuring the northeast Mahanadi, explaining most of the negative loadings. Segment-2 (1940Segment-2 ( -1978 for which the rst pattern accounts for 39.07% comparably equals to the rst pattern of Segment-1 but completely different loading characteristics i.e., negative loadings explaining northeast and northwest Mahanadi, whereas the second and third patterns account for 20.75% and 33.78% Cuttack district in Odisha state has been identi ed as the core of negative loading, which is spatially located at 85°30'E x 20°30'N and accounts for 39.07% of total variance during , whereas Nayagarh and Khordha districts spatially located at 85°15'E x 20°30'N were found to be a core point of positive loadings during , accounting for 20.75% of the total variance. Finally, the second pattern of segment-3 (1979-2017) accounted for 28.52% of the variance with (unde ned) dispersed core point (refer to Table-3

Statistics of Temporal (T)-mode PCA implementation
For studying the patterns of precipitation regime, the monthly relative precipitation is then implemented to a Temporal(T)-mode of sequential spatial pattern analysis. The rst three un-rotated components whose eigenvalue > 1 as an output of the implementation is shown in Table-4, for which segment- 1 (1901-1939) accounts for 87.40%, segment-2 (1940-1978) accounts for 77.57%, and segment-3 (1979-2017) accounts for 80.71% of the variance. Further, parallel analysis for component retention found three-component from segment-2, two-components from segment-1 and segment-3. Figure-6 shows the spatial distribution of varimax rotated loadings of retained principal components. The spatial distribution of monthly relative precipitation can be explained by the rst two components of segment-1 and segment-3, whereas the rst three components for segment-2.
The test results indicate that for segment-1, April, May, September, October, November, and December (July  Tables 5 and 6). Table 1 Decomposition analysis of the Cubic-root transformed data matrix of (M x N) using Spatial(S)-modes of Sequential spatial pattern analysis for the rst ve components whose Eigenvalue is > 1.    Table 4 Decomposition analysis of the monthly relative precipitation data matrix of (N x M) using Temporal(T)-modes of Sequential spatial pattern analysis for the rst ve components whose Eigenvalue is > 1.    statistic test [3] as illustrated in Figures-8 and 10. Similarly, cluster-2 has high homogeneity among all segments with a p-value of 1 (100%) see Fig. 10b. While cluster-3 has medium-range homogeneity between segment-2 and 3 with p-value 0.305 (30.5%) see Fig. 10c.
The boxplot analysis of the identi ed patterns of precipitation for their dispersion among the clusters is shown in Figure- with a symmetric distribution. May, June, July, August, September, October, and November are the rainiest months in the lower or the southernmost portion of the Mahanadi basin. In this precipitation regime, the summer monsoon and autumn are the two main rainy seasons, as well as the late spring (May) contributing noticeable precipitation to its annual total. Further, followed by a dry season that starts in December and ends in April, see Figure-11c.

Conclusion
Eigen-based sequential spatial pattern analysis, for studying the spatial-temporal variation of the Mahanadi basin and the maximum loading value approach for regionalizing the identi ed pattern. The objective of regionalizing the identi ed patterns is to study the aging effect. The aging effect of 117 years of gridded rainfall data of resolution 0.25° x 0.25° was studied in three segment layers. Investigation of precipitation characteristics to assess the spatio-temporal variation at the regional scale along with regionalization is presented here.
Observations from this investigation include: for the different segments are the local-dependent phenological changes that emerge because of the substantial effect of hydrology, climatology, and local topography. The spatial patterns of precipitation regimes were in uenced by the summer monsoon and the moist current prevailing from the Indian Ocean.
The in uence was evident from the results of the boxplot ( gures-8, 10, and 11), which accurately re ects the precipitation variability in the amount of magnitude at different locations. The regionalized patterns of the precipitation obtained using the maximum loading value approach have different applications and uses. For water resource management in regions with known precipitation variability, the future prediction of precipitation can be simpli ed. This can be furnished by de ning regional time-series for each of the identi ed clusters which are then associated with predictors (climatological indices) and accomplished by a stochastic and statistical method. Similarly, clusters with known precipitation regimes are useful in activities that include agricultural planning, farming calendar, planting time of different crops, and rain-fed and dry farming practices.
The limitations of this study are based on the results of the precipitation regionalization, which may be substantially different for different methods and the different sizes of clusters characterized by the different time periods and spatial resolution. Therefore, sensitivity analysis of the results is characterized by different spatial resolutions and a different number of station points over the entire basin.

Declarations
Ethical statement: The submitted manuscript with the title "Density-Based Spatial Clustering of Application with Noise approach for regionalization and its effect on Hierarchical CA" is an original work for peer review and publication. This manuscript has not been sent to any other publication or journal for any review and will not be sent unless a clear consent is received from your side.
Data and Code availability statement: we acknowledge the data/code share policy and the same will be available for the readers and the researchers that support the results and analysis presented in this paper.
Consent to participate: I (Corresponding author) gave my consent to participate in a related research study.
Consent to publication: The Authors grant consent to the publisher and declare that any person named as coauthor of the contribution is aware of the fact and has agreed to be so named.   Comparison of empirical cumulative distribution function for each regionalized pattern to its different segment. The black dash vertical line represents the mean annual precipitation of the Mahanadi basin (1572mm) Figure 10 Boxplot and KS 'D -test' (pairwise comparison) for each regionalized pattern to its different segment