Estimating and forecasting the spread of COVID-19 in South Korea: A Bayesian SIHR-based dynamic model with non-pharmaceutical interventions

Background: As of May 22, 2020, the total number of confirmed COVID-19 cases is over 5 million worldwide and more than 300 thousand people have lost their lives to the virus. South Korea also experienced a sharp increase in late February, but owing to non-pharmaceutical interventions, the number of confirmed cases has been decreasing since March. In this study, we aimed to investigate the transmission dynamics with these effects as well as forecast the spread of COVID-19 in South Korea, using a flexible statistical model. Methods: We analyzed the COVID-19 data obtained at the Korea Centers for Disease Control and Prevention from Feb 18 to April 30. Using a Bayesian susceptible-infectious-hospitalized-removed (SIHR) dynamic model, we estimated the dynamic transmission rate considering the non-pharmaceutical intervention effects and forecast the confirmed cases. Results: The estimated transmission rate without any control effects was 0.4605 with 95% credible interval (0.4468, 0.4745). During the days with effects between February 26 and March 6, the daily transmission rate decreased by about 89.48% of that of the previous day. With consistent control effects, it remained at 0.1549 with 95% credible interval (0.1497, 0.1602). Based on the estimated transmission rate, the forecast number of COVID-19 infections in South Korea showed an overall decreasing pattern. Conclusions: We considered and estimated the dynamic effects of the non-pharmaceutical interventions on COVID-19 using a Bayesian SIHR-based model. This study shows that non-pharmaceutical interventions including active testing, quarantine and isolation, personal preventive measures, and social distancing are crucial to curb the transmission. Our findings contribute to a better understanding of non-pharmaceutical interventions in COVID-19.

3 protective equipment (PPE), social distancing, the closure of schools, and national lockdowns. Thus, it is important to investigate the transmission dynamics considering the non-pharmaceutical interventions and forecast the number of COVID-19 infections. Such study can allow to provide real-time information to healthcare workers and help policymakers to create more evidence-based policies.
South Korea has also been affected by the virus since the index case occurred on January 20. Until February 17, 17 out of 30 cases in the country were imported from abroad, accounting for more than 50%. However, the country's 31 st patient, a follower of Shincheonji, which is a minor South Korean Christian sect, occurred in Daegu on February 18. The patient was found to have had attended a few services in Daegu where a mass infection in the religious group occurred. Thus, the number of new confirmed cases dramatically increased to about 400 cases until February 22 (4 days). As a result, the Korean government raised the coronavirus alert to the highest level on February 23 [2], and strongly encouraged citizens to practice social distancing, wear face masks, and wash hands frequently, to try and control the spread of COVID-19. In addition, COVID-19 diagnostic testing capabilities have been dramatically expanded, and a large-scale and rapid virus testing was conducted to identify the infected patients at the early stage. Finally, the daily average number of new confirmed cases in mid-April was less than 10, and the government shifted to mitigation. Thus, it is important to estimate the dynamic transmission rate considering the effects of the non-pharmaceutical interventions in South Korea. Forecasting the spread of COVID-19 is also important because the pandemic is on-going. However, some studies on COVID-19 in South Korea assumed a constant transmission rate using mathematical models to forecast the number of COVID-19 infections [3][4][5]. Recently, a deterministic SIR-based econometric model was used to estimate the effects of nonpharmaceutical intervention policies in South Korea, but not to forecast the number of coronavirus infections [6].
In this work, we estimated the impact of the non-pharmaceutical interventions on COVID-19 in South Korea from February 18 to March 31 using a Bayesian susceptible-infectious-hospitalized-removed (SIHR) statistical model.
In addition, we forecast the number of new COVID-19 infections in South Korea, using the proposed model. To the best of our knowledge, no studies estimated intervention effects and forecast the spread of COVID-19 together.
The KCDC website reports daily new confirmed cases through press releases. The GIDCC additionally provides information on the symptom onset date of the confirmed patients in Gyeonggi.
Since we mainly wanted to investigate the transmission dynamics within the local community, we did not use the cases imported from abroad. As the number of cases increased from February 18 and most of the earlier cases were travel-related, we used the number of confirmed cases from February 18 to April 30. There were 10,125 confirmed cases during this period. In the proposed SIHR model, we require the symptom onset date information rather than the confirmation date information. However, we had the symptom onset date information only for Gyeonggi, which accounts for 5.2% of the total data. Based on the Gyeonggi dataset, we assumed a Gamma distribution with mean 4.35 and variance 11.76 for the period from symptom onset to the confirmation. We

Statistical model
Many researchers have conducted mathematical model-based studies to investigate the transmission dynamics and forecast the number of COVID-19 infections. A mathematical susceptible-infectious-recovered (SIR) and susceptible-exposed-infectious-recovered (SEIR) were considered to forecast the number of coronavirus infections in China [7,8]. A flowchart of the SIHR model is presented in Figure 3. The classical mathematical 5 SIHR model has four compartments: susceptible ( ), infectious ( ), hospitalized ( ), and removed ( ). The basic model framework in the mathematical SIHR model can be described as follows: where the parameters , , are isolation, transmission, and removal rates, respectively.
The inverse of the isolation rate, 1/ indicates the average interval from the symptom onset to hospitalization, that is, the infection transmission period. The inverse of the removal rate, We assume that the number of new infectious individuals with symptoms at time , , that is, the number of individuals moving from to at time , follows a Poisson distribution with mean , Based on the mathematical model described above, the time-varying transmission rate is considered as follows: For the transmission rate parameters, we used a Bayesian framework with prior distributions. For the parameters and , we used a Gamma distribution with mean 1 and variance 10 and Beta distribution with mean 1/2 and variance 1/12, respectively.
To compare the model performance of the SIHR model with the non-pharmaceutical intervention effects, we additionally considered an SIHR model without the effects as a competing model, as follows: where the transmission rate remains consistent during the study period.
We determined the time points 1 and 2 based on the model performance using mean squared prediction error (MSPE) and deviance information criterion (DIC, [10]). As shown in Table 1, February 26 and March 6 for 1 and 2 showed the best model performance, which is our final model. In addition, we compared the MSPE and DIC values of the models with or without intervention effects. The model with non-pharmaceutical intervention effects showed a dramatic improvement in both prediction power and model fitness. R2WinBUGS [11]. We used two parallel chains with different initial values. To check sample convergence, we utilized the Gelman-Rubin statistic, trace plots, and auto-correlation. For each chain, after the burn-in of 10,000 samples, 10,000 samples were used for parameter estimation with thin 10.    Despite the strengths of our model, there are some limitations. First, we were able to obtain the limited information for the symptom onset date. Due to the lack of such data, the symptom onset dates were estimated using the confirmation date information, and the estimates were used for modeling. However, we can expect more accurate results if we have complete symptom onset data. Second, for computing efficiency, we manually determined the period with intervention effects, 1 and 2 , but the period itself can be estimated through the statistical model.

Results
Lastly, imported cases from foreign countries were not considered in the model since we were focused on the local transmission within the community. However, except for the patients who were screened and quarantined immediately at the airport, the model can also contain a compartment of the individuals who spread out to the local community as they can be one main source of transmission.
For future research, we are planning to conduct spatial and spatio-temporal studies on the COVID-19 infections in South Korea. Since understanding and estimating spatial and spatio-temporal dynamics is essential in the prevention of the transmission of infectious diseases, the relevant studies would provide valuable information to control COVID-19.

Conclusions
To the best of our knowledge, this study is the first to model the COVID-19 infection trend and forecast future trends using a Bayesian statistical SIHR model with the preventive measures effects. With the assumption that the effects increase daily during a certain period, we found out that the transmission rate dropped to nearly one third of the initial transmission rate. Our findings show that the number infections in South Korea has been decreasing and non-pharmaceutical interventions are very important to curb transmission.