Worldwide Trend Analysis and Potential Spread Prediction of SARS-CoV-2

Covid-19 declared as a pandemic by the World Health Organisation on January 30, 2020, is a major disaster which shook the roots of almost all the economies around the globe. The outbreak began in a city of China, Wuhan majorly due to consumption of bats during December. The symptoms, the ways of spreading of infection are quite similar to that of influenza. The areas with higher pollution levels of particulate matter have reported a higher number of deaths as compared to the number of deaths in the cases from comparatively less polluted areas. The meteorological factors have also played a major role in the extent of spread of disease and the ability of humans living in the area to fight against the disease. Social distancing and lockdown all over the nation have proved to be major weapons to fight against the further spread of disease. The lockdown, on the other hand, has given relief from the high levels of pollution resulting in good air quality and clean rivers. This study focuses on the spread of the novel coronavirus all over the world with the trend in the number of confirmed cases, deaths and the cured from the onset and also highlighting the situations in five countries. The study further gives a prediction of the number of cases for ten days and correlates the values with the actual number of cases reported and throws light on the future aspects of the infection.


Introduction
The outbreak of the novel coronavirus (also known as SARS-CoV-2) began in December 2019 from the city of Wuhan, China due to exposure of people to seafood market (Gautret et al., 2019). SARS-CoV-2 infection commonly called as COVID-19(Coronavirus disease) is a respiratory disorder with a varying range of severeness from a mild upper respiratory illness to severe interstitial pneumonia and acute respiratory distress syndrome (ARDS) , Wang et al., 2020, Petrosillo et al., 2020. The virus belongs to the betacoronavirus genus which includes Bat SARS-like coronavirus, SARS-CoV and MERS-CoV as per the genome sequence analysis and is associated with milder infections.
One major difference is that SARS (Severe Acute Respiratory Syndrome) and MERS (Middle East Respiratory Syndrome) are majorly related to nosocomial spread but Covid-19 is widely transmitted at the community level. The Covid-19 virus is phylogenetically related to the Bat SARS-like coronaviruses (96%) which were isolated in China from horseshoe bats during 2015 to 2018 and is an indication of a completely different evolution with bats as a wild reservoir of the virus. Markedly, SARS-CoV-2 has been isolated from pangolins and it was found that the isolated pangolin CoV genomes have ~85. 5-92.4% similarity to SARS-CoV-2, indicating that pangolin may be a potential intermediate host for SARS-CoV-2 (Kang et al., 2020).
The World Health Organization (WHO) evaluated the reproductive number R 0 of the novel coronavirus infection to be between 2 and 2.5, which is higher than SARS (1.7-1.9) and MERS (<1), suggesting that SARS-CoV-2 has a higher epidemic potential. However, according to some studies the R 0 value has been estimated to reach the value of 4. A recent study calculated the average reproductive number of SARS-CoV-2 to be 3.28, with a median value of 2.79, which thus exceeds the WHO values (Petrosillo et al., 2020, Li et al., 2020  The transmission routes of the virus are similar to that of pneumonia i.e., through respiratory routes, physical contact, and oral and faecal routes (Tellier 2006, Brankston et al., 2007. Since SARS-CoV-2 has high transmission rate at the community level the chain of spreading of virus goes uncontrolled. Since the last two decades, the eruption of coronaviruses infection and recurring global public health emergencies warns us that the coronaviruses are a major threat to human health and should not be avoided. The studies have shown that the people living in the areas with higher concentrations PM2.5 have higher chances of death in comparison to the areas with lower levels. The major reason behind this is that it is a respiratory malady and therefore affects the lungs. The long term exposure to high air pollution levels increases the vulnerability to experiencing the most critical cases of Covid-19 infection( N Roy, A. et al.,2020) A study has claimed that an increase in only 1 µg/m3 in PM2.5 is associated with a 15% increase in the COVID-19 death rate. The Harvard analysis is the rst study to draw attention towards the statistical link, revealing a "large overlap" between Covid-19 deaths and other diseases linked with long term exposure to high air pollution levels. The study targets the spread of SARS-CoV-2 around the globe, the number of cases reported, cured cases, deaths in ve countries to give a prediction about the future aspects of the pandemic.

Methodology
The methodology consists of 3 steps, the rst one is the data collection. We have collected the dataset from Kaggle, the second step is to compare the number of Covid-19 cases from the date of 22 nd January 2020 till 30 th April 2020 globally and the third step is to predict the number of cases till 10 th of May 2020. Our study period was 3 months and we analyzed and visualized the spreading of the virus country-wise as well as globally during the study period with con rmed cases, recovered cases and deceased. We have compared the data from 5 different countries affected the most with COVID-19. The total number of con rmed cases, Total death reported and the total number of recovery has been considered for the study. Finally, we predicted the expansion of the virus globally with the help of plotly and prophet python library. Prediction is a typical data science exercise that helps the administration with function planning, objective setting, and anomaly detection.

Prophet Prediction
An open-source library Prophet developed by Facebook that depends on decomposableseasonality + trend + holidays) models. It furnishes us with the capacity to make time-series forecasting with great precision utilizing basic instinctive parameters. The trend parameters are growth, changepoints, n_changepoints and changepoint_prior_scale. It has used in many settings because of its two main advantages are straightforward to create a sensible and accurate prediction. Prophet Predictions are adjustable in manners that are instinctive to non-specialists. This procedure is based on anadditive regression model and it's a nonparametric regression type. It was recommended by Jerome H.

Results And Discussion
The spread of COVID-19 to different parts of the world caused a drastic loss of lives and economy. The people living in different geographical locations have faced a wide variety of impacts of the ailment. The geographical location, the meteorological conditions, the immunity of people as a result of the difference in the climatic conditions at different places and the different age groups are some of the major parameters which can be considered to determine the impact level of the pandemic. In this study, we have studied the variation in the number of cases of corona infection with various parameters like temperature, humidity, air quality, number of testings, population, age groups around the globe. The study then focuses on ve major countries of the world to compare the variations and determine the possible reasons behind the variations in the number of cases and predict the future aspect of the disease. Countries like America which were initially taking SARS-CoV-2 infection very lightly and declared national lockdown very late even after the onset of the disease is now seeing the situation going out of hand. Delay in lockdown promoted the transmission of the disease and in ation in the number of patients to a large extent. In the cradle stage, the number of deaths was quite low. But, as the number of cases increased the death peak begin to rise due to the di culty in the proper treatment of the people. This is a major reason for the rise in the peak in the last week of March. The age factor also played a major role in increasing the death poll. The people in older years faced di culty in ghting with the disease due to weak immunity and therefore survival rate is low. The places with high air pollution particularly PM2.5 have reported a higher number of deaths due to damaged respiratory system because it is an airborne disease which further destroys the respiratory tract and lungs(Suresh A etal., 2020) Cold weather and humidity promote the transmission of the pandemic as it increases the life of the virus and due to low-temperature people prefer being close to each other to prevent heat loss.

Trend of COVID-19 infection in the world with time
The numbers of cases reported were quite low initially in January and were majorly from China. The mass movement of people to and from China gradually increased the number of patients all over the world due to the transmission of the disease. The transmission at the community level multiplied the length of the chain of viral infection which resulted in an increase in the number of cases at a higher rate. The countries with colder climatic conditions have reported a higher number of patients as compared to the countries with warmer climatic conditions. The countries in which lockdown was announced late have reported a drastic increase in the number of cases within a few weeks.
In January, the maximum number of deaths were from China for it being the epicentre of the malady. The number of deaths was very low at the earlier stage as the number of infected cases was low and the patients were able to receive proper treatment. Maximum deaths were reported from the regions with higher air pollution levels which already destroy the respiratory system of humans. The sudden increase in the number of patients from April resulted in a simultaneous increment in the number of deaths due to di culty in proper treatment. The old age people due to low immunity have low survival rates and thus most of the reported deaths constituted them.
The earlier months of the year have seen a lower recovery rate because the proper treatment for the disease was unknown. With the end of February, the recovery rate started to increase due to the discovery of an alternative treatment for the disease. The countries with warmer climatic conditions have reported a higher number of recovery cases due to strong immunity of people. The month of April has seen the highest number of recovery cases till now due to high temperature which has weakened the impact of the virus. The people in the younger years of age have higher chances of recovery than the old age people.

Comparison of the infected, death and recovered polls of ve major countries.
Lockdown and social distancing have served as major weapons in controlling the severity of the pandemic. The countries which announced earlier lockdown and social distancing were able to control the further spread of the disease (Srivastava, A etal., 2020) The spread of disease in China could only be stopped by the announcement of lockdown in the month of January. The mass movement of people to different parts of the world has led to the transmission of the ailment from China to the world. In Italy, the sudden increase in the cases could be controlled by lockdown only. The colder countries have reported higher number of cases because low temperature and humid conditions promote the spread of the SARS-CoV-2 virus. USA due to very late lockdown has reported tremendous increase in the number of cases. In India, the announcement of national lockdown at the very onset of disease helped to reduce the transmission of malady to a great extent. The earliest lockdown in Morocco helped the country to remain safe in the pandemic.
The announcement of lockdown helped to gradually control the increase in the number of cases which helped in successful treatment of people and thus reducing the number of deaths. India and China were able to reduce the number of deaths. The people living in India due to their strong immunity as a result of warm climatic conditions are able to ght the disease which reduced the number of death cases. The high number of patients in Italy and United States of America due to di culty in proper treatment have a high death rate. Morocco due to early lockdown has a positive sign of zero deaths.
China has reported a high increase in the number of recovered cases in the month of March due to control over the continuously increasing pandemic in the country. USA reported an increase in the recovery rate in the country after the use of hydroxychloroquine drug. The people of India due to their strong immunity recovered quickly from the disease. The use of hydroxychloroquine drug also increased the recovered rate in Italy. Early lockdown protected Morocco from this malady.

Date
Predicted Cases Actual Cases  additive regression model was used to predict the spread. The study was carried out by using the dataset from Kaggle and we predicted the epidemic spread based on the previous data which is collected until 26 th April 2020. Table 1 gives the values of the number of predicted cases and the number of actual cases from 1 st May 2020 to 10 th May 2020. As per our prediction, the number of con rmed cases till April end is 32,83,334 and will reach to 40,88,231 on 10 May 2020. On correlation of the number of predicted cases with the actual number of cases the Pearson correlation coe cient value comes out to be 0.98 which shows perfectly positive correlation therefore proving the accuracy of data.

Conclusion
The pandemic COVID-19 has caused a lot of loss to lives and economy all over the world. Currently, the number of con rmed cases as on 10 May 2020 are 43,37,436. The number of con rmed cases with time has seen a decline in their intensity of increasing due to announcement of lockdown, social distancing and other preventive measures. The recovery rate with time has also seen an increment due to use of hydroxychloroquine vaccine. The countries with cold temperature have reported higher number of cases in comparison to the countries with warm climatic conditions. The areas with high air pollution levels have reported relatively higher deaths than the locations with low air pollution levels due to the fact that novel corona virus infection is a respiratory disease and therefore affects lungs. The different parts of the world have suffered a wide range of impacts of the malady due to difference in their meteorological conditions, geographical conditions, immunity power of people, etc. Thereafter, the prediction of the number of con rmed cases for ten days shows highly positive correlation with the actual number of cases and the Pearson coe cient comes out to be 0.98 thus showing 98% accuracy. SARS CoV-2 infection outbreak has caused devastation in major developed countries of the world. Since, it is similar to SARS and MERS the prior knowledge of their cure and various other measures has proved to be helpful to ght the pandemic. Figure 1 Expansion of Covid-19. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning