An Optimal Estimation Method on the Analysis of the Generalized Gamma Distribution Parameters Using Runge-Kutta Method

Recently, in the literature many modi�cations introduced to improve the maximum likelihood estimation method, however most of them are less e�cient than the Bayesian method especially for small samples. Therefore, in this study an improvement method based on the Runge-Kutta technique has been introduced for estimating the generalized gamma distribution parameters and compare them with the Bayesian estimates based on the informative gamma and kernel priors. A comparison between these estimators is provided by using an extensive Monte Carlo simulation based on two criteria, namely, the absolute bias and mean squared error. The simulation results indicated that the Runge-Kutta method is highly favorable, which provides better estimates and outperforms the Bayesian estimates using different loss functions based on the generalized progressive hybrid censoring scheme. Finally, two real datasets analyses for COVID-19 epidemic in Egypt are presented to illustrate the e�ciency of the proposed methods.


Introduction
In applied statistics, modelling and analysing of real-life data are necessary to understand the important features of random and non-random phenomena to draw appropriate conclusions.This requires the selection of statistical models based on probability distributions.Therefore, analysis of recent data in applied sciences (medical sciences, environmental sciences, engineering, nance, etc.) has shown the generalized gamma distribution (GGD) is an appropriate model that allows revealing some important details from these datasets.In this study, we introduce an improvement estimation method using the Runge-Kutta for estimating the GGD parameters comparing with the Bayesian method based on the informative and kernel priors.The statistical analysis of the GGD based on complete and censored samples has been studied by several authors for analysing data in applied sciences such as Stacy and Mihram (1965) derived the estimation of the GGD parameters, Harter (1966) derived asymptotic maximum likelihood estimators (MLE) for GGD parameters, Lawless (1980Lawless ( , 1982) ) applied the conditional inference for the GGD and log gamma parameters, DiCiccio (1987) applied the approximate inference for the GGD parameters, Wingo (1987) derived the MLE of the GGD parameters, Wong (1993) derived the MLE of the three GGD parameters.Maswadah (1989) derived the conditional and structural con dence intervals for the GGD parameters, and Maswadah (1991) derived the structural con dence intervals based on the type-II progressively censored samples from three-parameter GGD.Hwang et al (2006) derived the moment estimates of the GGD parameters using its characterization.Dadpay et al. (2007) introduced some concepts of the GGD based on information theory.Gomes et al. (2008) described several di culties in estimating the parameters and the complexity of the maximum likelihood estimation (MLE) method, and Song (2008) used their perception in developing inference procedures with GGD, especially MLE to justify the model in terms of a simpler alternative form.Geng and Yuhiong (2009) suggested a new re-parameterization of GGD to maintain the numerical stability for the MLE based on type-II progressively censored samples.Mukherjee et al. (2011) presented a Bayesian study comparing GGD with its components.Chakraborty (2015) obtained a new discrete distribution related to generalized gamma distribution and discussed some of its properties.Shanker and Shukla (2017) introduced comparative studies on the modeling of lifetime data using three-parameter GGD.
This distribution reduces to the two-parameter Weibull for k = 1, the two-parameter gamma for α = 1, and the oneparameter exponential for both α = k = 1.Thus, the generalized gamma distribution incorporates all the important lifetesting distributions, and this is perhaps the reason that the model has enough scope in lifetime data analyses.At k = 2, we get a new distribution with PDF and CDF are given respectively by: f(x) = αβ 2 x 2α − 1 exp( − βx α ) , x > 0, α, β > 0, (1) where α and β are shape and scale parameters respectively.
The purpose of this paper is to derive point estimates of the number of deaths in the near future using a generalized gamma distribution based on Runge-Kutta and the Bayesian estimation methods when data are subjected to a generalized progressive hybrid-censoring scheme.
In reliability analysis, the progressive Type-II censoring scheme is more applicable in life test experiments, which is useful for both industrial life test applications and clinical trials and allows removal of some remaining experimental units at different stages before test termination.However, the trial time can be quite long since there are some highly reliable units.
Thus, Kundu and Joarder (2006) proposed a censoring scheme called the Type-II progressively hybrid censoring scheme.However, the drawback of the progressive hybrid censoring scheme is some failures may occur, before the time point T. In order to provide a guarantee for the number of failures observed as well as the time to complete the test, Cho et al. (2014Cho et al. ( , 2015a) ) proposed the generalized progressive hybrid censoring scheme (GPHCS) that modi es the progressive hybrid censoring scheme.It allows the experiment to continue beyond time T to observe at least k failures, if the number of failures is less than m.GPHCS can be described as: Consider N identical items are placed on a test with considering R 1 , R 2 , . . ., R m are the randomly removal units which are xed at the beginning of the experiment with m < N such that The terminated time T is also xed beforehand with the integers k and m are pre-xed such that k < m.In general, at the time of the i th failure, R i units will be removed randomly from the remaining surviving units ii.If the m th failure does not occur before the time point T and only k failures occur after the time point T, where X m : m : N > T. Then at the time point X k :: m : N , the experiment terminates and all the remaining surviving units iii.If the m th failure does not occur before the time point T and only J failures occur at the time point T, where X k : m : N < T < X m : m : N .Then at the time point X J : m : N , the experiment terminates and all the remaining surviving units R * T = N − J − ∑ J j = 1 R j , will be removed.
Thus, given a generalized progressive hybrid censored sample, the likelihood function for the three different cases can be written in a uni ed form as follows: ) where R T * is the number of surviving units that are removed at the stopping time

Runge-Kutta Method
The MLE θ = θ x of θ is the solution of the stationary equation,  are de ned and continuous functions at all points ( x, θ), which ensures the existence of a unique solution for (6).Using any numerical technique such as the fourth order Runge-Kutta, we can nd the approximate solution given a trial set of parameter values and initial conditions.If the initial conditions are unavailable, they must be appended to the parameter θ as quantities with respect to which the t is optimized.
For the lifetime model (2) the log-likelihood function of (3) and its derivatives can be derived as: Thus, using (6) with the above corresponding derivatives, we can nd the point estimates for each and using the Runge-Kutta method.

Bayesian Estimation
In this section, the Bayes estimations will be derived based on gamma and kernel prior distributions using two different loss functions: Firstly, the squared error loss function (SLF), L(θ, θ * ) = (θ − θ * ) 2 , which is classi ed as a symmetric loss function and that penalize overestimation and underestimation equally on ( − ∞, ∞).For this loss function the Bayes estimator that minimizes the risk function is given by θ * = E(θ|x).
Secondly, in real applications the underestimation of a parameter value very often implies different results from overestimation, both in quality and quantity.Thus, the resulting losses can be described by a linear function with different coe cients characterizing positive and negative errors.This function, called LINEX loss function (LLF) that has been introduced in Variane (1975) and Xiuchun et al. (2007), it can be de ned as asymmetric loss function with the following form: The sign and magnitude of the shape parameter δ represents the direction and degree of symmetry respectively, where positive values means overestimation is more serious than underestimation and vice versa for negative values.The unique Bayes estimator θ * L of θ under the LINEX loss function, the value that minimizes the risk function, is given by provided the expectation E θ e − δθ x) exists and is nite.We de ne the compound LINEX loss function as ].
We consider the unknown parameters α and β have two different independent informative prior distributions as follows.
i. Gamma prior

( | ( | ( | ( |
We suggest using independent priors for each of the parameters and such as gamma distributions.Hence, the joint prior density is given by ii. Kernel Prior For deriving the kernel prior, we introduce the bivariate kernel density estimator for the unknown probability density function with support on ( ), which is de ned as

8
, h i , i = 1,2 are called the bandwidths or smoothing parameters, which chosen such that where N is the sample size.The in uence of the smoothing parameter h is critical because it determines the amount of smoothing.Too small value of h may cause the estimator to show insigni cant details while too large value of h causes over smoothing of the information contained in the sample, which in consequence, may mask some of the important characteristics.Thus, a certain compromise is needed.However, the optimal choice for h i which minimizes the mean squared errors is h i = 1.06 σi N − 0.2 and the estimated value of the population standard deviations σi could be used as S i the sample standard deviations.The optimal choice for the kernel function K(. , .)can be used as the standard normal distribution and the double exponential distribution for the parameters α and β.
The basic elements associated with the kernel density function have been studied extensively in Guillamon et al. (1998Guillamon et al. ( , 1999)).Based on the properties of the maximum likelihood estimates (MLEs) of the parameters, which converge in probability to the original parameters, the kernel prior estimate can be derived using the following algorithm: 1-Generate a random sample X = (X 1 , X 2 , X 3 , . . ., X N ) from the parent distribution f(x; α, β) with given speci ed values for the unknown parameters α,andβ.
3-For each sample in step 2, using the above R-K method, we can generate random samples for the parameters α, andβ as the random variables The kernel prior has been used in Ahsanullah (2013) and Maswadah (2006Maswadah ( , 2007Maswadah ( -2010)).Thus, using the joint priors ( 7) and (8) with the likelihood function of the GPHCS (3) the posterior density for the parameters α, and β can be written in a uni ed form as follows: , where is the general prior distribution function with p 1 = p 2 = 0 for the informative prior (8), and p 1 = p 2 = 1, a = c = 1, and b = d = 0 for the kernel prior (9).
Thus, the posterior density for the parameters and can be written in a uni ed form as follows: Thus, based on (10) we can use the Tierney and Kadane approximation method to approximate all the Bayes estimators for the unknown parameters.
Tierney and Kadane (1986) introduced an easily computable approximation for the posterior mean and variance of a nonnegative parameter or more generally, of a smooth function of the parameter that is non-zero on the interior of the parameter space.For detail, let u(α, β) be a smooth, positive function on the parameter space.The posterior expectation of u(α, β) can be obtained as For (α, β) the Bayes estimator using Tierney and Kadane approximation of u(α, β) can be obtained as where ( α, β) and ( α * , β * ) maximize the H( α, β) and , respectively.
Using (10) we can de ne H = H(α, β) = logf(α, β | X) as: For studying the performance of the R-K, and Bayes methods with different loss functions, through two criteria the average bias (AVB) and the mean squared error (MSE) as given by: θ is the estimate of θ and L is the number of replications.
In our simulation study we choose the distribution hyperparameters of α and βas: From the simulation results in Tables 4, 5, 6 and 7, some of the points are quite clear based on these estimates and the others have been summarized in the following main points: 1.It is clear that, in general the point estimates for the parameters α and β based on the R-K method have the smallest estimated AVB and MSEs as compared with the estimates based on the Bayesian method based on the informative and kernel priors with different loss functions.
2. The estimated MSEs based on the kernl prior are smaller than those based on the informative gamma prior.
3. The estimated MSEs increase as the value of α increases and decrease as the value of β increases.4. The estimated MSEs decrease as the termination time of the experiment T and the sample size increase as expected.
As a conclusion, it appears that the point estimates based on the R-K method compete and outperform the Bayesian method based on the informative and kernel priors.

Real Data Analysis
In this section, we studied two real datasets to show the performance of the proposed methods on the GGD model, which is the most widely used and desirable lifetime distribution.This distribution has been used in many applications in different elds and in new areas such as biomedical sciences and survival analysis to describe the age of the speci c mortality and failure rates.Thus, we have tted these datasets using some goodness of t tests such as the Kolmogorov-Smirnov (K-S), Anderson-darling (A-D) and Chi-Square (CH2) tests for signi cance level test equals 0.05.

Covid-19 Data Application in Egypt.
Here, we propose a concrete application with an actual data set to assess the interest in the generalized gamma model.The considered data sets are the deaths from COVID-19 in Egypt, which are due to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2).Unfortunately, it spread rapidly at the beginning of the year 2020, taking thousands of victims, and forcing governments to take extraordinary measures to protect their people.Of course, the overall comprehension of COVID-19 is a challenge for all scientists, but it is essential for future generations.In this section, we make a modest contribution to the topic by applying the generalized gamma distribution to analyse the daily dataset of daily con rmed deaths of the COVID-19 epidemic in Egypt, as shown below, to provide an accurate estimate for some important measures such as (Average cases, the standard deviation of cases, the probability to have a certain number of cases in the near future, etc.) to make more efforts to confront these epidemics.This dataset was obtained from the following Email address: ( ) For the deaths COVID-19 data, the results for the GG distribution are a good t for this dataset as shown in Figure (1-a) and Figure (2-a).Among the information in Table 2, the MLEs for the model parameters are derived to estimate the corresponding estimates of the PDFs as given by: .
Hence, is the estimated function of the unobservable underlying PDF for the number of COVID-19 deaths.With this function, one can estimate some important measures.By denoting the random variable X that represents the daily deaths of COVID-19 in Egypt during the epidemic, the probability that X belongs to a chosen Interval [a, b], can be estimated by In general, the estimate of the average of a given function of X, say T(X), can be estimated as μ = E(T(x)) = ∫ ∞ 0 T(x) f(x)dx.
For example, the average number of COVID-19 deaths in Egypt can be accurately approximated by taking T(x) = x, etc.Thus, based on these datasets, the average number of COVID-19 deaths based on the data set I is 52.13 with a standard deviation 5.28 deaths and for the data set II is 51.76 with a standard deviation 9.69 deaths.From Table 2, the R-K and Bayes estimates for based on the data set II are almost zero, ensuring that the standard deviation is small, and the distribution is bell-shaped with a light tail as shown in Figure (2-b).This indicates that the COVID-19 deaths will rapidly decrease with increasing time.These results indicate that the GGD is highly effective for modelling the COVID-19 dataset I than the dataset II.  3 indicate the future observations will be rapidly decrease with increasing time.Thus, the results of these datasets ensure the simulation results.parameter β using the R-K and Bayes methods with m = (n/2 and 3n/4) and k=(m/2 and 3m/4) at T = 2 and δ = 2 for LINEX loss.

Conclusion
We have applied the proposed methods to analyze real data applications including the COVID-19 pandemic.It was found that the R-K method is more e cient than the Bayesian method using the informative and kernel priors, based on the generalized progressive hybrid censored data.However, the estimates based on the kernel prior are more e cient than those based on the informative prior and are relatively close to the R-K estimates.Thus, the R-K method is a viable estimation method for any effective lifetime model and is reliable and easy to apply especially for medical, biological, and engineering researchers.Moreover, based on this study, we can conclude that the number of COVID-19 deaths in Egypt is declining in the near future.Thus, the proposed model provides a better understanding of the COVID-19 epidemic and may provide insights for researchers and potential users with models that can be applied to practical life situations.

Declarations Declaration of con icting interests
The author declares no potential con icts of interest with respect to the research, authorship, and/or publication of this article.

References
Figures

Supplementary Files
This is a list of supplementary les associated with this preprint.Click to download. AppendixA.docx j , where i ∈[1.m].Thus, we have three scenarios: i.If the time of the m th failure occurs before the time point T, then the experiment will stop at the time point X m : m : N and all the remaining surviving units R m = N − m − ∑ m − 1 j = 1 R j , will be removed.
θ = 0, which is a function of x and θ x , where H( x; θ) is the log-likelihood function that depends on the unknown parameter θ = (α, β) and the data x = (x 1 , x 2 , . . ., x N ).Applying the implicit function theorem to the stationary equation with considering all partial derivatives as well as the total derivatives are assumed to be evaluated at some known value of θ(x) = θ 0 , say.Taking the total derivative for the stationary equation with respect to x ∈ x, seeRamsay et al. (2007), we obtain.

a = c = 5
and b = d = 3 the values for the parameters α = (1, 2) and β = (2, 3).Using the above values of the parameters for generating different samples from the GGD distribution with sizes N = 20, 40, and 60 to represent small, moderate and large sizes.To assess the performance of these estimates and the AVB and MSEs for each one were calculated using 1000 replications.

Figure 1 a
Figure 1

Table 1
The critical and calculated values for the K-S, A-D and CH2 tests and their powers (p-values) for the GGD.The MLE's for the parameters for these datasets have been calculated.

Table 3
The estimates and the (MSEs) in parentheses for future unobservable number of COVID-19 deaths in Egypt based on the R-K and Bayes methods at the hyperparameters (A = 5, B = 3, C = 5, D = 3), for m = n/2, k = m/2 .Finally, the results in Table1, indicate the GGD is a good t for the sample data I than the sample data II where the power of the tests is greater than the signi cance level of the tests as shown in Figures(1-a) and (2-a).The results in Table2for these datasets indicate that the estimated RMSEs values based on the R-K method are smaller than those based on the Bayes method for large values of T with considering the MLEs are the true values of the parameters.The results in Table

Table 4 :
The Average bias (ABS) and the Mean Square Errors (MSEs) in parentheses for the GGD ameter αusing the R-K and Bayes methods with m = (n/2 and 3n/4) and k=(m/2 and 3m/4) at T = 0.75 and δ = 2 for LINEX loss.

Table 5 The
Average bias (ABS) and the Mean Square Errors (MSEs) in parentheses for the GGD

Table 7 The
Average bias (AVB) and Root Mean Square Errors (RMSEs) in parentheses for the GGD