Analysis and Prediction of COVID-19 using SIR, SEIR, and Machine Learning Models: Australia, Italy, and UK Cases

-The novel Coronavirus disease, known as COVID-19, is an outbreak that started in Wuhan, 6 one of the Central Chinese cities. In this report, a short analysis focusing on Australia, Italy, and the 7 United Kingdom has been conducted. The analysis includes confirmed and recovered cases and deaths, 8 the growth rate in Australia as compared with Italy and the United Kingdom, and the outbreak in 9 different Australian cities. Mathematical approaches based on the susceptible, infected, and recovered 10 case (SIR) and susceptible, exposed, infected, and recovered (SEIR) models were proposed to predict 11 the epidemiology in the countries. Since the performance of the classic form of SIR and SEIR depends 12 on parameter settings, some optimization algorithms, namely, the Broyden–Fletcher–Goldfarb–Shanno 13 (BFGS), conjugate gradients (CG), L-BFGS-B, and Nelder-Mead are proposed to optimize the 14 parameters of SIR and SEIR models and improve its predictive capabilities. The results of optimized 15 SIR and SEIR models are compared with the Prophet algorithm and logistic function as two known 16 ML algorithms. The results show that different algorithms display different behaviours in different 17 countries. However, the improved version of the SIR and SEIR models have a better performance 18 compared with other mentioned algorithms described in this study. Moreover, the Prophet algorithm 19 works better for Italy and the United Kingdom cases than for Australian cases and Logistic function 20 compared with Prophet algorithm has a better performance in these cases. It seems that Prophet 21 algorithm is suitable for data with increasing trend in pandemic situations. Optimization of the SIR and 22 SEIR models parameters has yielded a significant improvement in the prediction accuracy of the 23 models. Although there are several algorithms for prediction of this Pandemic, there is no certain 24 algorithm that would be the best one for all cases.

parameters (Susceptible, Infected, Recovered) in the SIR model.The results indicate that the suggested method is precise enough with low error compared to analytical methods.Mbuvha and Marwala (2020) calibrated the SIR model to South Africa after considering different scenarios for R0 (reproduction number) for reporting infections and healthcare resource estimation for the next few days.Qi, Xiao et al. (2020) proposed that both daily temperature and relative humid-ity influenced the occurrence of COVID-19 in Hubei province and insome other provinces.Salgotra, Gandomi et al. (2020) developed two COVID-19 prediction models based on genetic programming and applied this model in India.Findings from a study by (Salgotra, Gandomi et al. 2020) show genetic evolutionary programming models are highly reliable for COVID-19 cases in India.
In January 2020, the first case of Covid-19 was reported in Australia.In this report, a short analysis focusing on Australia was addressed and reported and continued as a simulation for the next few days.
The manuscript is organized in several sections.Section I presents the research methodology.Section II and III introduce the SIR and SEIR models.Section IV shows the prediction algorithms (logistic function and Prophet algorithm).Sections V shows the results.The conclusion and discussion are provided in the last section.

I. Research methodology
The study was carried out in several phases.For the first step, data were collected from World Health Organization (WHO) and John Hopkins University since they collect data from different organizations.After that, data were analyzed and preprocessed in order to avoid any duplicated and missing values.Numerical tests were performed using Python and R and executed on a computer Intel ® Core i7-4510U 2.0 GHz 8 GB DDR3 Memory (Supplementary file).The flowchart of the research methodology is provided in Figure 1.(2) in which • S is the number of individual susceptible at time t.
• I is the number of infected individuals at time t.
• R is the number of recovered individuals at time t.
• β and γ are the transmission rate and rate of recovery (removal), respectively.

III. The SEIR model
The SEIR model is an extended version of SIR model (Peng, Yang et al. 2020).It models the interaction of people between different conditions: the susceptible (S), exposed (E), infective (I), and recovered (R).The parameters S, I, and R are same as parameters in SIR model and E presents the fraction of individuals that have been infected but does not show any signs.The SEIR-model diagram is as follows (Fig. 3): Figure 3 The SEIR diagram (Peng, Yang et al. 2020) The equations of SEIR model are defined as follows (Eqs.4-10): dependent mortality rate (Peng, Yang et al. 2020).

IV. Prediction
In the present section, some machine learning techniques were used for COVID-19 case predictions in Australia, Italy, and the United Kingdom.Machine learning is a branch of computer science in which data could teach algorithms.The learning process could be done as supervised-, unsupervised, and/or semi-supervised learning forms (Mitchell 1997, Arkes 2001, Armstrong 2001, Nikolopoulos, Litsa et al. 2015, Maleki, Mahmoudi et al. 2020).In this section, some approaches that are used for prediction of cases (confirmed and deaths) of

a. Analysis i. New cases
In this sub-section, the confirmed growth rates focusing on Australia, Italy, and the United Kingdom for every day from 2020-04-24 to 2020-05-23 were calculated.Figure 4 depicts the growth rate of confirmed cases in the countries.As can be seen in Figure 4, the growth rate for Australia was always below 0.5 during times of outbreak and just above 0.0 at the of May, while the rate for Italy and the United Kingdom is generally high.The growth rate for the United Kingdom was almost above 2.0 in April and then dramatically declined in May.The rate for Italy fluctuates between 0.5 and 1.5 in April and May.
Figure 5 also presents the growth rate of death cases for the above-mentioned countries daily from 2020-04-24 to 2020-05-23.The growth rate for death cases in Australia fluctuated between 0 and 7 in April and May and was 7 at the end of April (higher than Italy and United Kingdom during the same time), while for Italy, the rate was almost below 2.0 during the same time period and for the United Kingdom, the rate was just below 4.0 at the end of April and just above 0.0 at the end of May.ii.Overall growth rate This section shows numbers of active cases in these three countries.The active cases were calculated using the following equation: Active_cases=confirmed_cases -deaths_cases -recovered_cases (13) From equation ( 13), the overall growth rate could be calculated according to Equation 14: In equation ( 14), the index i presents day. Figure 6 illustrates the overall growth rate for confirmed cases in the countries.Negative numbers show that people recovering are faster than those getting sick and that would be good news.The rate for Australia in the time period was almost below zero and changed from −15 at the end of April to just below −5 at the end of May and for Italy fluctuated between just above −7.5 and just above 0.0, while the rate for the United Kingdom was almost always positive number in the time horizon (00.0 and 3.0).Figure 7 illustrates the number of death cases in Australia compared with the two other countries, and it is clear that the number in Australia is significantly lower than other two.With the aim of forecasting, the logistic function is defined in Equation ( 11) and was applied to collected data (Time horizon: start of outbreak in the countries) and results have been illustrated in Figures 9-14    = ) since these parameters could be estimated.Before the start of the outbreak, it is essential to address whether the number of susceptible cases is equal to the number of people in these countries because no antibodies exist, and no vaccines for the disease have been developed.At first, 0 R =2.7 was fixed (reported by Australian Government: Department of Health) as the the median number, 0.378 β = , and 0.14 γ = . Figure 19 (a-c) present the confirmed cases provided by the optimized SEIR model with the abovementioned decriptions in the three countries (See Figure 18).Real data were used to estimate the values of β and γ .An optimizer was used to find the best estimation of β and γ .The optimization algorithms were the Broyden-Fletcher-Goldfarb- Shanno (BFGS) algorithm (Fletcher 1987), L-BFGS-B (Byrd, Lu et al. 1995), conjugate gradients (CG), (Fletcher and Reeves 1964), and Nelder-Mead (Nelder and Mead 1965).The parameter settings are provided in Table 3.The flowchart of the improved SIR and SEIR versions and parameter settings for the above-mentioned algorithms are addressed in Figure 18 and Table 4, respectively.Table 5 shows the optimized values obtained by different algorithms (SIR model).The best values for the parameters were found using the Nelder-Mead algorithm (for SIR model) and L-BFGS-B algorithm (for SEIR model).This method is illustrated in Figure 18.As was mentioned earlier, before the start of the outbreak, the number of susceptible cases was equal to the number of people in these countries because no antibodies exist, and no vaccine for the disease is available.From Wikipedia, the populations of Australia, Italy, and the United Kingdom are 25 06 , 60 06 , and 67 06 , respectively.Table 6 illustrates the RMSE values obtained by the algorithms (for SIR and SEIR models) showing improvements in significantly reducing the values.
•Defining initial values for the parameters and variables.•Solving the SIR and SEIR models, numerically.
Step 1 Step 2 •Using an optimization algorithm to estimate the best values for the paramters.•Estimating R0.
Step 3 17     Tables 7-9 present the results of the predicted cumulative confirmed cases obtained using the Prophet algorithm in the three countries.In the presented tables, y represents the true values of confirmed cases, ds is time, ŷ is the forecasted values,

VI. Conclusion and discussion
COVID-19 is a family of Coronaviruses that has affected the life of billions of people worldwide.
The first phase of the paper started with a short analysis of COVID-19, focusing on Australia, Italy, and the United Kingdom.The analysis presents confirmed and death growth rates in Australia, a comparison between Australia, Italy, and the United Kingdom, and also, a short analysis in different states of Australia.The analysis shows that generally Australia is in a good position compared with two other countries.However, the situation in different cities of Australia are completely complicated; for example, New South Wales has the most confirmed and deaths cases, while Northern Territory shows the least confirmed and death cases (it is valuable to mention that New South Wales has more population).
Mathematical approaches based on SIR and SEIR were proposed to predict the epidemiology in Australia, Italy, and the United Kingdom.Since the classic form of SIR and SEIR are deterministic, an improved version based on parameter optimization was suggested to improve the prediction.
The results are compared with logistic function and Prophet algorithm and summarized as follows: • Comparison between the classic form of SIR model with real data showed a significant gap.However, initializing the parameters of the SIR model significantly improved the prediction.
• The classic form of SIR model worked better for the United Kingdom, while the SIR model was not suitable for Australia case (regarding RMSE values).
• The logistic function was a good model for the United Kingdom with an r2_score of 0.97, while this score for Australia was 0.67 and Italy was 0.95.
• The best RMSE value belonged to the Australia cases (confirmed and deaths).
• Optimization of parameters of the SIR and SEIR models significantly improved the prediction accuracy of the models.
• Improved version of SEIR has better performance compared with SIR model (Regarding

RMSE values and Figures).
• Optimized SEIR model has better prediction for UK and Italy compared with Australia.
• The best values for the parameters were found using the Nelder-Mead algorithm for SIR model and L-BFGS-B algorithm for SEIR model.
• The Prophet algorithm worked better for Italy and the United Kingdom cases than for Australian cases.
• Logistic function compared with Prophet algorithm had a better performance in these cases.
• The improved version of the SIR and SEIR model had a better performance compared with logistic function, Prophet algorithm, and classic form of SIR model.
In this paper, all forecasting was addressed without considering of scenario of social distancing and quarantine that makes it valuable as a future direction.This paper presents SIR and SEIR as epidemiology models; it would interesting to test other epidemiology models.Moreover, it is worthwhile to combine the mathematical model with other observations such as Policy intervention, human behavior, and constraints.

Compliance with Ethical Standards:
• Sources of Funding: The authors confirm that there is no source of funding for this study.
• Conflict of Interest: The authors declare that they have no conflict of interest.
• Human Participants and/or Animals: None.The SEIR diagram (Peng, Yang et al. 2020) Figure 4 Growth rate (Con rmed cases in Australia, Italy, and the United Kingdom)

References
Figure 5 Growth rate (death cases in Australia, Italy, and the United Kingdom) Figure 6 Overall growth rate for con rmed cases in Australia, Italy, and the United Kingdom  Predicted cases in Australia using the susceptible, infected, recovered (SIR) model (blue: real con rmed cases, red: SIR model) Predicted cases in Italy based on the SIR model (blue: real con rmed cases, red: SIR model) Predicted cases in UK based on the SIR model (blue: real con rmed cases, red: SIR model) Flowchart of improved version of SIR and SEIR models

Supplementary Files
This is a list of supplementary les associated with this preprint.Click to download. Supplementarymaterials.docx

Figure 1 Figure
Figure 1 Flowchart of the current research process 10)Where α presents the protection rate, β shows the infection rate, illustrates the inverse of the average latent time, δ displays the inverse of the average quarantine time,

COVID(
curve's maximum value, and K is the logistic growth of the curve b) Times Series forecasting with the Prophet algorithm The Prophet algorithm is an open-source tool developed by Facebook' s Data Science team, and its main goal is business forecasting (Taylor and Letham 2017, Taylor and Letham 2018).The Prophet algorithm works well with time-series data that have seasonal effects and are robust in dealing with missing data (Ndiaye, Tendeng et al. the Prophet algorithm, the forecast could be written as shown in Equation 5

Figure 4 Figure 5
Figure 4 Growth rate (Confirmed cases in Australia, Italy, and the United Kingdom)

Figure 6
Figure 6 Overall growth rate for confirmed cases in Australia, Italy, and the United Kingdom

Figure 8
Figure 8 (a-h) shows confirmed versus deaths cases in each individual Australian state.By now (2020-

Figure 9 Figure 13
Figure 9 Prediction of confirmed cases by logstic function (Australia)

Figure 15
Figure 15 Predicted cases in Australia using the susceptible, infected, recovered (SIR) model (blue: real confirmed cases, red: SIR model)

Figure*
Figure 18 Flowchart of improved version of SIR and SEIR models

Figure 19
Figure 19 Prediction done by optimized SEIR model and upper bounds for the forecasted values, respectively.It should be noted , the forecasted values were made between the cutoff and cutoff + horizon.Tables 7-9 are also called cross-validation matrices that are used to find the error values between y and ŷ after which the RMSE values can be obtained (Figure23 a-c).Figures20-22visualize forcasted values obtained using the Prophet algorithm, indicating the mentioned algorithm is fitted for the cases of Italy and the United Kingdom but with errors for Australia.

Figure 20 Figure 23
Figure 20 Forcasting by Prophet for the next year (Confirmed cases in Australia) Figures

.
As it is shown in Figures 9-14, the logistic function is fitted until the trend of cases is increases and to evaluate the performance of metric R2 scores used for confirmed and death cases.Results are presented in Table2.Another metric that has been used in experiments is the root mean square error (RMSE), and the results of RMSE I depicted in Table2.The best RMSE value belongs to the Australian cases (confirmed and deaths).

Table 1
R2 score fore different countries, different cases

Table 2
Root mean square error (RMSE) values for different countries and different cases

Table 3
RMSE values obtained by SIR model (before optimization of parameters)

Table 5
Median values of SIR parameters determined by the departments of health in each country

Table 6
RMSE values obtained based on the improved SIR model considering a 0.99 confidence interval

Table 8
Predicted cumulative confirmed cases in the United Kingdom(cross-validation matrix)