Worldwide Trend Analysis and Potential Spread Prediction of SARS-CoV-2



Covid-19 declared as a pandemic by the World Health Organisation on January 30, 2020, is a major disaster which shook the roots of almost all the economies around the globe. The outbreak began in a city of China, Wuhan majorly due to consumption of bats during December. The symptoms, the ways of spreading of infection are quite similar to that of influenza. The areas with higher pollution levels of particulate matter have reported a higher number of deaths as compared to the number of deaths in the cases from comparatively less polluted areas. The meteorological factors have also played a major role in the extent of spread of disease and the ability of humans living in the area to fight against the disease. Social distancing and lockdown all over the nation have proved to be major weapons to fight against the further spread of disease. The lockdown, on the other hand, has given relief from the high levels of pollution resulting in good air quality and clean rivers. This study focuses on the spread of the novel coronavirus all over the world with the trend in the number of confirmed cases, deaths and the cured from the onset and also highlighting the situations in five countries. The study further gives a prediction of the number of cases for ten days and correlates the values with the actual number of cases reported and throws light on the future aspects of the infection.  


The outbreak of the novel coronavirus (also known as SARS-CoV–2) began in December 2019 from the city of Wuhan, China due to exposure of people to seafood market (Gautret et al., 2019). SARS-CoV–2 infection commonly called as COVID–19(Coronavirus disease) is a respiratory disorder with a varying range of severeness from a mild upper respiratory illness to severe interstitial pneumonia and acute respiratory distress syndrome (ARDS) (Chen et al., 2020, Wang et al., 2020, Liu et al., 2020, Petrosillo et al., 2020). The virus belongs to the betacoronavirus genus which includes Bat SARS-like coronavirus, SARS-CoV and MERS-CoV as per the genome sequence analysis and is associated with milder infections. One major difference is that SARS (Severe Acute Respiratory Syndrome) and MERS (Middle East Respiratory Syndrome) are majorly related to nosocomial spread but Covid–19 is widely transmitted at the community level. The Covid–19 virus is phylogenetically related to the Bat SARS-like coronaviruses (96%) which were isolated in China from horseshoe bats during 2015 to 2018 and is an indication of a completely different evolution with bats as a wild reservoir of the virus. Markedly, SARS-CoV–2 has been isolated from pangolins and it was found that the isolated pangolin CoV genomes have ~85.5–92.4% similarity to SARS-CoV–2, indicating that pangolin may be a potential intermediate host for SARS-CoV–2 (Kang et al., 2020).

The World Health Organization (WHO) evaluated the reproductive number R0 of the novel coronavirus infection to be between 2 and 2.5, which is higher than SARS (1.7–1.9) and MERS (<1), suggesting that SARS-CoV–2 has a higher epidemic potential. However, according to some studies the R0 value has been estimated to reach the value of 4. A recent study calculated the average reproductive number of SARS-CoV–2 to be 3.28, with a median value of 2.79, which thus exceeds the WHO values (Petrosillo et al., 2020, Li et al., 2020, Chen et al., 2020, Wu et al., 2020, Liu et al., 2020). The mean incubation period of novel coronavirus is estimated to be 3–7 days (range, 2–14 days) (Backer et al., 2020, Lauer et al., 2020), indicating a long transmission period of SARS-CoV–2. It is estimated that SARS-CoV–2 latency is consistent with those of other known human coronaviruses, including non-SARS human coronaviruses (mean 3 days, range 2–5 days) (Lessler et al., 2009), SARS- CoV (mean 5 days, range 2–14 days) (Varia et al., 2003) and MERS-CoV (mean 5.7 days, range 2–14 days) (Assiri et al., 2013). Moreover, it has been reported that the asymptomatic COVID–19 patients during their incubation periods can effectively capable of transmitting SARS-CoV–2 (Rothe et al., 2020, Quilty et al., 2020) which is different from SARS-CoV because most SARS-CoV cases are infected by ‘super spreaders’ and SARS-CoV cases cannot infect susceptible persons during the incubation period (Lipsitch et al., 2003).

The betacoronavirus denoted as the 2019 novel coronavirus (2019-nCoV) and officially renamed as severe acute respiratory syndrome coronavirus 2 (SARS-CoV–2) by the International Committee on Taxonomy of Viruses, and the disease caused by it as coronavirus disease 2019 (COVID–19), has spread like fire and is a global concern now. The spread of Covid–19 to the world was due to the trading of seafood from the city of China and the continuous movement of people from the infection source to different parts of the world. There have been significant outbreaks in many regions of China as well as around the globe with major concerns in Asia, Europe, North America, South America, Africa and Oceania. The disease is potentially zoonotic, with an estimated mortality rate of 2–5% (Kang et al., 2020) which is lower in comparison to that of SARS (9.5%) and MERS(34.4%) (Munster et al., 2020, Chen et al., 2020). The transmission routes of the virus are similar to that of pneumonia i.e., through respiratory routes, physical contact, and oral and faecal routes (Tellier 2006, Brankston et al., 2007). Since SARS-CoV–2 has high transmission rate at the community level the chain of spreading of virus goes uncontrolled. Since the last two decades, the eruption of coronaviruses infection and recurring global public health emergencies warns us that the coronaviruses are a major threat to human health and should not be avoided.

The studies have shown that the people living in the areas with higher concentrations PM2.5 have higher chances of death in comparison to the areas with lower levels. The major reason behind this is that it is a respiratory malady and therefore affects the lungs. The long term exposure to high air pollution levels increases the vulnerability to experiencing the most critical cases of Covid–19 infection( N Roy, A. et al.,2020) A study has claimed that an increase in only 1 µg/m3 in PM2.5 is associated with a 15% increase in the COVID–19 death rate. The Harvard analysis is the first study to draw attention towards the statistical link, revealing a “large overlap” between Covid–19 deaths and other diseases linked with long term exposure to high air pollution levels. The study targets the spread of SARS-CoV–2 around the globe, the number of cases reported, cured cases, deaths in five countries to give a prediction about the future aspects of the pandemic.


The methodology consists of 3 steps, the first one is the data collection. We have collected the dataset from Kaggle, the second step is to compare the number of Covid–19 cases from the date of 22nd January 2020 till 30th April 2020 globally and the third step is to predict the number of cases till 10th of May 2020. Our study period was 3 months and we analyzed and visualized the spreading of the virus country-wise as well as globally during the study period with confirmed cases, recovered cases and deceased. We have compared the data from 5 different countries affected the most with COVID–19. The total number of confirmed cases, Total death reported and the total number of recovery has been considered for the study. Finally, we predicted the expansion of the virus globally with the help of plotly and prophet python library. Prediction is a typical data science exercise that helps the administration with function planning, objective setting, and anomaly detection.

Prophet Prediction

An open-source library Prophet developed by Facebook that depends on decomposableseasonality + trend + holidays) models. It furnishes us with the capacity to make time-series forecasting with great precision utilizing basic instinctive parameters. The trend parameters are growth, changepoints, n_changepoints and changepoint_prior_scale. It has used in many settings because of its two main advantages are straightforward to create a sensible and accurate prediction. Prophet Predictions are adjustable in manners that are instinctive to non-specialists. This procedure is based on anadditive regression model and it’s a nonparametric regression type. It was recommended by Jerome H. Friedman and Werner Stuetzle (1981) and is a fundamental piece of the ACE algorithm. The mathematical expression of the additive regression model is shown below.

Where {yi,xi1,……….,xip}ni = 1 is the dataset {\displaystyle \{y_{i},\,x_{i1},\ldots,x_{ip}\}_{i = 1}^{n}}of n statistical units, {xi1,……….,xip}ni = 1 indicates predictors and yi is the output, E[€] = 0. Var(€) = σ2 and E[fj(Xj)] = 0. fj(xij) is known as smooth function fit from the data. Fitting the additive model (i.e. fj(xij) {\displaystyle f_{j}(x_{ij})}) can be done using the backfitting algorithm proposed by Andreas Buja, Trevor Hastie and Robert Tibshirani on 1989.

Results And Discussion

The spread of COVID–19 to different parts of the world caused a drastic loss of lives and economy. The people living in different geographical locations have faced a wide variety of impacts of the ailment. The geographical location, the meteorological conditions, the immunity of people as a result of the difference in the climatic conditions at different places and the different age groups are some of the major parameters which can be considered to determine the impact level of the pandemic. In this study, we have studied the variation in the number of cases of corona infection with various parameters like temperature, humidity, air quality, number of testings, population, age groups around the globe. The study then focuses on five major countries of the world to compare the variations and determine the possible reasons behind the variations in the number of cases and predict the future aspect of the disease.

The COVID–19 has spread to 210 countries and territories of the world and 2 international conveyances. The given world map shows the number of infected cases reported, the death cases and the recovered cases around the world. As per the record, the total numbers of cases of SARS-CoV–2 infection in the world till 15 April 2020 have reported to be 2,076,502 with the number of infected cases being 1,427,636 and the number of deaths reaching 138,744. The recovery rate of the patients around the world until mid-April was 78.62% with the number counting 522,122. The different regions of the world have faced a wide variety of severeness of the malady. The major reasons for variable degree of impact are the difference in the geographical locations, the meteorological parameters, the immune power of people and the different age groups. These factors are somewhere related to each other. The meteorological parameters particularly, temperature and humidity play a major role in the transmission of the disease.

Initially, the cases of infection were reported majorly from China and continued to rise until the lockdown was announced in the country which gradually reduced the further transmission of the disease by keeping people indoors. The peak of infected cases begins to rise when the infected people from the country migrated to different regions of the world while being unknown about the infection. This mass movement of people accelerated the transmission of the disease to different parts of the world thus further multiplying the number of patients. Some countries of the world acted smart and announced lockdown at an earlier stage to slow down the transmission which flattened their curve of infected cases quite early.

Countries like America which were initially taking SARS-CoV–2 infection very lightly and declared national lockdown very late even after the onset of the disease is now seeing the situation going out of hand. Delay in lockdown promoted the transmission of the disease and inflation in the number of patients to a large extent. In the cradle stage, the number of deaths was quite low. But, as the number of cases increased the death peak begin to rise due to the difficulty in the proper treatment of the people. This is a major reason for the rise in the peak in the last week of March. The age factor also played a major role in increasing the death poll. The people in older years faced difficulty in fighting with the disease due to weak immunity and therefore survival rate is low. The places with high air pollution particularly PM2.5 have reported a higher number of deaths due to damaged respiratory system because it is an airborne disease which further destroys the respiratory tract and lungs(Suresh A etal., 2020) Cold weather and humidity promote the transmission of the pandemic as it increases the life of the virus and due to low-temperature people prefer being close to each other to prevent heat loss.

3.1 Trend of COVID–19 infection in the world with time

The numbers of cases reported were quite low initially in January and were majorly from China. The mass movement of people to and from China gradually increased the number of patients all over the world due to the transmission of the disease. The transmission at the community level multiplied the length of the chain of viral infection which resulted in an increase in the number of cases at a higher rate. The countries with colder climatic conditions have reported a higher number of patients as compared to the countries with warmer climatic conditions. The countries in which lockdown was announced late have reported a drastic increase in the number of cases within a few weeks.

In January, the maximum number of deaths were from China for it being the epicentre of the malady. The number of deaths was very low at the earlier stage as the number of infected cases was low and the patients were able to receive proper treatment. Maximum deaths were reported from the regions with higher air pollution levels which already destroy the respiratory system of humans. The sudden increase in the number of patients from April resulted in a simultaneous increment in the number of deaths due to difficulty in proper treatment. The old age people due to low immunity have low survival rates and thus most of the reported deaths constituted them.

The earlier months of the year have seen a lower recovery rate because the proper treatment for the disease was unknown. With the end of February, the recovery rate started to increase due to the discovery of an alternative treatment for the disease. The countries with warmer climatic conditions have reported a higher number of recovery cases due to strong immunity of people. The month of April has seen the highest number of recovery cases till now due to high temperature which has weakened the impact of the virus. The people in the younger years of age have higher chances of recovery than the old age people.

3.2 Comparison of the infected, death and recovered polls of five major countries.

Lockdown and social distancing have served as major weapons in controlling the severity of the pandemic. The countries which announced earlier lockdown and social distancing were able to control the further spread of the disease (Srivastava, A etal., 2020) The spread of disease in China could only be stopped by the announcement of lockdown in the month of January. The mass movement of people to different parts of the world has led to the transmission of the ailment from China to the world. In Italy, the sudden increase in the cases could be controlled by lockdown only. The colder countries have reported higher number of cases because low temperature and humid conditions promote the spread of the SARS-CoV–2 virus. USA due to very late lockdown has reported tremendous increase in the number of cases. In India, the announcement of national lockdown at the very onset of disease helped to reduce the transmission of malady to a great extent. The earliest lockdown in Morocco helped the country to remain safe in the pandemic.

The announcement of lockdown helped to gradually control the increase in the number of cases which helped in successful treatment of people and thus reducing the number of deaths. India and China were able to reduce the number of deaths. The people living in India due to their strong immunity as a result of warm climatic conditions are able to fight the disease which reduced the number of death cases. The high number of patients in Italy and United States of America due to difficulty in proper treatment have a high death rate. Morocco due to early lockdown has a positive sign of zero deaths.

China has reported a high increase in the number of recovered cases in the month of March due to control over the continuously increasing pandemic in the country. USA reported an increase in the recovery rate in the country after the use of hydroxychloroquine drug. The people of India due to their strong immunity recovered quickly from the disease. The use of hydroxychloroquine drug also increased the recovered rate in Italy. Early lockdown protected Morocco from this malady.


Predicted Cases

Actual Cases































Table 1: Predicted and the Actual number of cases

Fig.8 interprets the prediction of SARS-CoV–2 infection round the globe using machine learning and an additive regression model was used to predict the spread. The study was carried out by using the dataset from Kaggle and we predicted the epidemic spread based on the previous data which is collected until 26th April 2020. Table 1 gives the values of the number of predicted cases and the number of actual cases from 1st May 2020 to 10th May 2020. As per our prediction, the number of confirmed cases till April end is 32,83,334 and will reach to 40,88,231 on 10 May 2020. On correlation of the number of predicted cases with the actual number of cases the Pearson correlation coefficient value comes out to be 0.98 which shows perfectly positive correlation therefore proving the accuracy of data.


The pandemic COVID–19 has caused a lot of loss to lives and economy all over the world. Currently, the number of confirmed cases as on 10 May 2020 are 43,37,436. The number of confirmed cases with time has seen a decline in their intensity of increasing due to announcement of lockdown, social distancing and other preventive measures. The recovery rate with time has also seen an increment due to use of hydroxychloroquine vaccine. The countries with cold temperature have reported higher number of cases in comparison to the countries with warm climatic conditions. The areas with high air pollution levels have reported relatively higher deaths than the locations with low air pollution levels due to the fact that novel corona virus infection is a respiratory disease and therefore affects lungs. The different parts of the world have suffered a wide range of impacts of the malady due to difference in their meteorological conditions, geographical conditions, immunity power of people, etc. Thereafter, the prediction of the number of confirmed cases for ten days shows highly positive correlation with the actual number of cases and the Pearson coefficient comes out to be 0.98 thus showing 98% accuracy. SARS CoV–2 infection outbreak has caused devastation in major developed countries of the world. Since, it is similar to SARS and MERS the prior knowledge of their cure and various other measures has proved to be helpful to fight the pandemic.


  1. Assiri A, Al-Tawfiq JA, Al-Rabeeah AA, Al-Rabiah FA, Al-Hajjar S, Al-Barrak A, et al. Epidemiological, demographic, and clinical characteristics of 47 cases of Middle East respiratory syndrome coronavirus disease from Saudi Ara- bia: a descriptive study. Lancet Infect Dis 2013;13:752–61. doi: 10.1016/ S1473- 3099(13)70204- 4 .
  2. Backer JA , Klinkenberg D , and Wallinga J. Incubation period of 2019 novel coronavirus (2019-nCoV)infections among travellers from Wuhan, China, 20-28 January 2020. Euro Surveill 2020; 25.
  3. Barreca, A.I., Shimshack, J.P., 2012. Absolute humidity, temperature, and influenza mortality: 30 years of county-level evidence from the United States. Am. J.
  4. Bauch CT, Lloyd-Smith JO, Coffee MP, Galvani AP. Dynamically Modeling SARS and Other Newly Emerging Respiratory Illnesses. Epidemiology. 2005; 16(6):791-801.doi:10.1097/01.ede.0000181633.80269.4c
  5. Brankston G, Gitterman L, Hirji Z, Lemieux C, Gardam M (2007) Transmission of influenza A in human beings. Lancet Infect Dis7(4):257–265.
  6. Chen H, Guo J, Wang C, Luo F, Yu X, Zhang W, et al. Clinical character- istics and intrauterine vertical transmission potential of COVID-19 infection in nine pregnant women: a retrospective review of medical records. Lancet 2020;395:809–15. doi: 10.1016/S0140- 6736(20)30360- 3 .
  7. Chen J. Pathogenicity and Transmissibility of 2019-nCoV-A Quick Overview and Comparison with Other Emerging Viruses. Microbes Infect. February 2020.
  8. Chen N, Zhou M, Dong X, et al. Epidemiological and clinical characteristics of 99 cases of
  9. Ciencewicki J, Jaspers I (2007) Air pollution and respiratory viral infection. Inhal Toxicol 19(14):1135–1146. doi:10.1016/j.micinf.2020.01.004
  10. Gautret P, Angelo KM, Asgeirsson H, et al. International mass gatherings and travel associated illness: A GeoSentinel cross-sectional, observational study [published online ahead of print, 2019 Nov 9]. Travel Med Infect Dis. 2019;101504. doi:10.1016/j.tmaid.2019.101504
  11. International spread of the 2019-nCoV outbreak originating in Wuhan, China: a modellingstudy. Lancet (London, England). January 2020. doi:10.1016/S0140-6736(20)30260-9
  12. Lauer SA, Grantz KH, Bi Q, Jones FK, Zheng Q, Meredith H, et al. The incu- bation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann Intern Med 2020 Mar 10 [Epub ahead of print]. doi: 10.7326/m20-0504 .
  13. Lessler J , Reich NG , Brookmeyer R , Perl TM , Nelson KE , and Cummings DA. Incubation periods ofacute respiratory viral infections: a systematic review. Lancet Infect Dis 2009; 9: 291-300.
  14. Li Q, Guan X, Wu P, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia. N Engl J Med. January 2020:NEJMoa2001316.
  15. Lipsitch M, Cohen T, Cooper B, Robins JM, Ma S, James L, et al. Transmis- sion dynamics and control of severe acute respiratory syndrome. Science 20 03;30 0:1966–70. doi: 10.1126/science.1086616 .
  16. Liu K, Fang Y-Y, Deng Y et al. Clinical characteristics of novel coronavirus cases in tertiary hospitals in Hubei Province. Chin Med J (Engl). 2020. doi:10.1097/CM9.0000000000000744
  17. Liu T, Hu J, Kang M, et al. Transmission dynamics of 2019 novel coronavirus (2019-nCoV).bioRxiv. January 2020:2020.01.25.919787. doi:10.1101/2020.01.25.919787
  18. Liu Y, Gayle AA, Wilder-Smith A, Rocklöv J. The reproductive number of COVID-19 is highercompared to SARS coronavirus. J Travel Med. February 2020. doi:10.1093/jtm/taaa021
  19. Lowen, A.C., Steel, J., 2014. Roles of humidity and temperature in shaping influenza seasonality. J. Virol. 88, 7692–7695.
  20. Munster VJ, Koopmans M, van Doremalen N, van Riel D, de Wit E. A Novel Coronavirus Emerging in China — Key Questions for Impact Assessment. N Engl J Med. January 2020:NEJMp2000929. doi:10.1056/NEJMp2000929
  21. N Roy, A.; Jose, J.; Sunil, A.; Gautam, N.; Nathalia, D.; Suresh, A. Prediction and Spread Visualization of Covid-19 Pandemic Using Machine Learning. Preprints 2020, 2020050147 (doi: 10.20944/preprints202005.0147.v1).
  22. Novel Coronavirus-Infected Pneumonia in Wuhan, China. JAMA. February 2020.doi:10.1001/jama.2020.1585
  23. Petrosillo N, Viceconte G, Ergonul O, Ippolito G, Petersen E, COVID-19, SARS and MERS: are they closely related?, Clinical Microbiology and Infection,
  24. Pica, N., Chou, Y.Y., Bouvier, N.M., Palese, P., 2012b. Transmission of influenza B viruses in the guinea pig. J. Virol. 86, 4279–4287.
  25. Quilty BJ, Clifford S, Flasche S, Eggo RMCMMID nCoV Working Group. Effec- tiveness of airport screening at detecting travellers infected with novel coron- avirus (2019-nCoV). Euro Surveill 2020;25. doi: 10.2807/1560-7917.ES.2020.25. 5.20 0 0 080 .
  26. Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19), 16-24February 2020. Available from:
  27. Rothe C, Schunk M, Sothmann P, Bretzel G, Froeschl G, Wallrauch C, et al. Transmission of 2019-nCoV infection from an asymptomatic contact in Ger- many. N Engl J Med 2020;382:970–1. doi: 10.1056/NEJMc2001468.
  28. Kang, W. Peng and Y. Zhu et al., Recent progress in understanding 2019 novel coronavirus (SARS-CoV-2) associated with human respiratory disease: detection, mechanisms and treatment, International Journal of Antimicrobial Agents, https: //
  29. Srivastava, A., Sharma, R. K., & Suresh, A. (2020). Impact of Covid-19 on Sustainable Development Goals. International Journal of Advanced Science and Technology, 29(9 Special Issue).
  30. Suresh, A., Chauhan, D., Othmani, A., Bhadauria, N., Aswin, S., Jose, J., & Mejjad, N. (2020). Diagnostic Comparison of Changes in Air Quality over China before and during the COVID-19 Pandemic.
  31. Tellier R (2006) Review of aerosol transmission of influenza A virus. Emerg Infect Dis 12(11):1657–1662.
  32. Varia M , Wilson S , Sarwal S , McGeer A , Gournis E , Galanis E , et al. Investiga- tion of a nosocomial outbreak of severe acute respiratory syndrome (SARS) in Toronto, Canada. CMAJ 2003;169:285–92 .
  33. Wang D, Hu B, Hu C, et al. Clinical Characteristics of 138 Hospitalized Patients With 2019
  34. World Health Organization. Coronavirus Disease 2019 (COVID-19) Situation Report-48, 08th March 2020. Available from:
  35. Wu JT, Leung K, Leung GM. Nowcasting and forecasting the potential domestic and2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet (London,England). 2020;0(0). doi:10.1016/S0140-6736(20)30211-7
  36. Zhao, N., Cao, G., Vanos, J.K., Vecellio, D.J., 2018. The effects of synoptic weather on influenza infection incidences: a retrospective study utilizing digital disease surveillance. Int. J. Biometeorol. 62, 69–84.


Competing interests: The authors declare no competing interests.