A Statistical Analysis of Chinese Stock Indices Returns From Approach of Parametric Distributions Fitting

The stock price process in China is full of uncertainty hence the stock indices were introduced to serve as indicators of the financial market. How to capture the statistical characteristics of Chinese stock indices returns by the method of parametric distributions fitting could be useful in the fields of econometrics and risk management. In this paper, we use a wider range of parametric distributions to model four main Chinese stock indices. We find a generalization of the Student’s t distribution is shown to provide the best fit.


Introduction
The stock market prediction has been a classical and challenging problem. Over the past decades, investors and researchers have been interested in developing and testing models of stock index behaviour through different statistical methods or data science(including machine or deep learning) techinques [1][2][3][4][5] to find the trends of financial markets. A certain stock index for a country is a calculated average of representative share prices, typically including the mainstream stocks from conceivable aspects of the economy and stock market. National stock indices can help investors to track changes in the value of a general stock market. An index of the stock market can be regarded as a combination of shares that provides a broad sample of the economic situation. The collective performance of these shares gives a good indication of trends in the overall market they represent.
Although there are many studies on the behaviour of stock market indices in US, Japan and India from perspective of historical data [4], financial news [1], investors' sentiment [5] etc. There have not been many examining the performance of the returns in the Chinese stock market. Finding the exact distribution of returns of stock indices in China could provide a useful benchmark to measure the success of investment vehicles such as mutual funds and share portfolios since stock investment decisions always rely on assessments of the distribution of expected stock returns which could be obtained from empirical data. In [6], the authors show that the α-stable distribution provides adequate fits to two major stock indices in China from January 4, 1993 to December 31, 2008 for Shanghai Composite Index (SSEC) and Shenzhen Component Index (SZCZ). The α-stable distribution [7] is specified by the characteristic function where 0 < α ≤ 2 is the basic stability parameter; δ ≥ 0,is positive and measures dispersion; −1 ≤ β ≤ 1, is a skewness parameter; −∞ < μ < ∞, could be thought of as a location measure;i = √ −1, sign (z) = 1 if z > 0, sign (z) = −1 if z < 0 and sign (z) = 0 if z = 0. The probability density function of an α-stable random variable is not known in closed form.
The motivation for conducting this study was to discover whether these distributions are appropriate for describing the return distribution in the Chinese stock market since finding the distribution of returns is a vital no matter in risk management or evaluating the performances of investments. Meanwhile, this paper aims to find a better parametric model fitting through some existing parametric distributions for the daily log return of four major stock indices in China based on a more completed empirical dataset compared to [6]. We fit eleven parametric distributions to the daily log returns of the stock indices. We find several distributions providing better fits than the α-stable distribution. The best fits are provided by the Laplace distribution and a generalization of the Student's t distribution. This is the first paper investigating and statistical properties of daily log returns of four major stock indices in China through some existing parametric distribution fitting approaches. However, there have been studies on stock indices in China focusing on other aspects. For example, [8] analyse the evolution of both the Shanghai A share and B share markets through a Markov-switching asymmetric GARCH model in four different time frames.
The contents of this paper are organized as follows. Sect. 2 describes the data used. An account of statistical distributions fitted to the data is given in Sect. 3. The results of the fitted distributions and their discussion are given in Sect. 4. Comparisons are made with the results in [6]. Finally, some conclusions are made in Sect. 5.

Data
The data we use are the historical stock price indices for China. In this paper, we focus on the daily log returns rather than raw returns for both theoretic and algorithmic reasons. Table 1 provides summary statistics for the daily All indices are negatively skewed except for the CSI. The skewness is the largest for the SZCZ and the smallest for the SSEC. All indices are highly peaked with kurtosis greater than that of the normal distribution. The kurtosis is the largest for the SZCZ and the smallest for the CSI.
The standard deviation and variance are the largest for the SZCZ. They are the smallest for the CSI. The range is also the largest for the SZCZ and the smallest for the CSI.
All indices have highly negative coefficients of variation. The CSI has the largest coefficient of variation and the SZCZ has the smallest.

Distributions Fitted
Let X denote a continuous random variable representing the log returns of the stock index of interest. Let f (x) denote the probability density function of X . Let F(x) denote the cumulative distribution function of X . We suppose X follows one of eleven possible distributions, the most popular parametric distributions used in finance. They are specified as follows: • the Student's t distribution [10] with • the Laplace distribution [11] with • the skewed t distribution [12] with where (e) k = e(e + 1) · · · (e + k − 1) denotes the ascending factorial; • the generalized t (GT) distribution [13] with • the asymmetric exponential power (AEP) distribution [14] with • the skewed Student's t (SST) distribution [15] with • the asymmetric Student's t (AST) distribution [15] with • the normal inverse Gaussian (NIG) distribution [16] with denotes the modified Bessel function of the second kind of order ν defined by where I ν (·) denotes the modified Bessel function of the first kind of order ν defined by • the hyperbolic distribution [16] with denotes the modified Bessel function of the second kind of order ν; • the generalized hyperbolic (GH) distribution [16] with denotes the modified Bessel function of the second kind of order ν.
Several of these distributions are nested: the Student's t distribution is the particular case of the skewed t distribution for λ = 0; the Student's t distribution is the particular case of the generalized t distribution for τ = 2; the skewed exponential power distribution is the particular case of the asymmetric exponential power distribution for p 1 = p 2 ; the Student's t distribution is the particular case of the skewed Student's t distribution for α = 1/2; the skewed Student's t distribution is the particular case of the asymmetric Student's t distribution for ν 1 = ν 2 ; the skewed exponential power distribution is the particular case of the asymmetric exponential power distribution for p 1 = p 2 ; the normal inverse Gaussian distribution is the particular case of the generalized hyperbolic distribution for λ = −1/2; the hyperbolic distribution is the particular case of the generalized hyperbolic distribution for λ = 1; and so on. The eleven distributions include heavy tailed and light tailed distributions. The Laplace, skewed exponential power and asymmetric exponential power distributions have light tails. The Student's t, skewed t, generalized t, skewed Student's t, asymmetric Student's t, normal inverse Gaussian, hyperbolic and generalized hyperbolic distributions have heavy tails. The α-stable distribution can be either light tailed (if α ≤ 2) or heavy tailed (if α > 2).
The process of modelling is given below: Each distribution was fitted by the method of maximum likelihood(MLE). That is, if x 1 , x 2 , . . . , x n are observations on X then the parameters of each distribution are the values maximizing the likelihood is a vector of parameters specifying f (·). We shall let = θ 1 , θ 2 , . . . , θ k denote the maximum likelihood estimate of . The maximization was performed using the routine optim in the R software package [17]. The standard errors of were computed by approximating the covariance matrix of by the inverse of observed information matrix, i.e., Many of the fitted distributions are not nested. Discrimination among them was performed using various criteria: • the Akaike information criterion due to [18] defined by AIC = 2k − 2 ln L ; • the Bayesian information criterion due to [19] defined by BIC = k ln n − 2 ln L ; • the consistent Akaike information criterion (CAIC) due to [20] defined by CAIC = −2 ln L + k (ln n + 1) ; • the corrected Akaike information criterion (AICc) [21] defined by • the Hannan-Quinn criterion [22] defined by HQC = −2 ln L + 2k ln ln n.
The smaller the values of these criteria the better the fit. For a more detailed discussion on these criteria, see [2,3,[23][24][25].

Result and Discussion
The eleven distributions in Sect. 3 plus the α-stable distribution in Sect. 1 were fitted to the data described in Sect. 2. The method of maximum likelihood was used. The R package due to [26] was used for fitting the generalized hyperbolic distribution. The R package due to Kharrat and Boshnakov [27] was used for fitting the α-stable distribution and for more detailed discussion on optimization techniques ,see [28]. The log likelihood values, and the values of AIC, AICc, BIC, HQC and CAIC for the fitted distributions (for each of the four stock indices) are shown in Tables 2, 3, 4  and 5. The Laplace, SEP, AEP and hyperbolic distributions gave very unsatisfactory fits for daily log returns of the SSEC. The SEP distribution gave very unsatisfactory fits for daily log returns of the SZCZ and CSI.
For the SSEC, SECZ and ZSE indices, the generalized t distribution gives the smallest values of AIC, AICc, BIC, HQC and CAIC. Hence, the generalized t distribution can be regarded as giving the best fit for the SSEC, SECZ and ZSE indices. For the CSI index, the asymmetric exponential power distribution gives the smallest values for AIC and AICC while the Laplace distribution gives the smallest values for BIC, HQC and CAIC. Since the Laplace distribution is the simpler of the two, we regard it as giving the best fit for the CSI index.
Comparison with the fitted α-stable distribution shows the following: all of the other fitted distributions for the SSEC index (Student's t, skewed t, generalized t, skewed Student's t, asymmetric Student's t, normal inverse Gaussian and generalized hyperbolic distributions) provide smaller values of AIC, AICc, BIC, HQC and CAIC; all of the other fitted distributions for the SECZ index (Student's t, Laplace,     Hence, the α-stable distribution is not a good model for the stock indices. Besides, the fitting of the α-stable distribution for each stock index took more than one hour of computer time. The fitting of other distributions took only a few seconds. Hence, the α-stable distribution is also not a computationally efficient model.  The parameter estimates and their standard errors for the best fitting distributions for each stock index are given in Table 6. The best fitting distributions for the SSEC, SZCZ and ZSE indices (all of them a generalized t distribution) are heavy-tailed. The best fitting distribution for the CSI index is light-tailed. The generalized t distribution has two tail parameters ν and τ . The ν is the largest for the SZCZ, the second largest for the ZSE and the smallest for the SECE. The τ is the largest for the ZSE, the second largest for the SZCZ and the smallest for the SECE. Hence, the decay of the tails is fastest for the SZCZ, second fastest for the ZSE and slowest for the SECE.
The generalized t distribution is symmetric and its location parameter μ is interpreted as the mean of the distribution. The sample means for the SSEC, SZCZ and ZSE reported in Table 1 are larger than the estimates of μ. One of the reasons for the differences is that the generalized t distribution does not account for negative skewness.
The adequacy of the best fitting distributions is assessed in terms of Q-Q plots, (Quantile Quantile Plot) P-P plots,(Percent-Percent plot) the one-sample Kolmogorov-Smirnov test, the one-sample Anderson-Darling test and the one-sample Cramer-von Mises test.
The P-P plots for the best fitting distribution for each of the four stock indices are shown in Figure 1. The Q-Q plots for the best fitting distribution for each of the four stock indices are shown in Figure 2

Conclusions
Motivated by [6], we have modelled log returns of the four major stock indices in China: Shanghai Composite Index (SSEC), Shenzhen Component Index (SZCZ), ZSE Composite Index and China Securities Index (CSI) 300 index. We fitted eleven popular parametric distributions as well as the α-stable distribution in [6] to the log returns. The generalized t distribution was shown to provide the best fit for the SSEC, SZCZ and ZSE indices. The Laplace distribution was shown to provide the best fit for the CSI index. The α-stable distribution did not provide an adequate fit for any of the data sets. It actually gave the worst fit among all distributions fitted. The best fit was assessed in terms of AIC values, AICc values, BIC values, HQC values, CAIC values, P-P plots, Q-Q plots, the one-sample Kolmogorov-Smirnov test, the one-sample Anderson-Darling test and the one-sample Cramer-von Mises test.
Implications of these results are in the area of risk management, where one may need to compute the value at risk and expected shortfall for risk, but also for investment purposes. To our knowledge, this is the first study investigating the statistical properties of the log returns of the four major stock indices in China. There is much scope for future work and possible extensions could include: i) using GARCH type processes to model the log returns, for example, the distributions mentioned in Sect. 3 can be used for modelling the innovation processes; ii) using multivariate processes to model the joint distribution of the log returns; iii) using nonparametric or semiparametric distributions to analyze the log returns.
Author Contributions Yuancheng Si: Data collection, results generation, Methodology and analysis. Saralees Nadarajah: Methodology and analysis were verified and necessary main body modifications suggested.
Funding This work does not have funding from any resource.
Data Availability Data is available with the authors. It will be provided when requested by somebody.
Code Availability Code is not available.

Conflict of interest
The authors declare that there is no conflict of interest.
Ethical approval Authors do not copied this work from any source and this work does not cause any harm to human or society.