Early Outbreak Analysis of COVID-19 Epidemic: China and Global Health Perspectives

Purpose: Globally, there is an obvious concern about the fact that the evolving 2019-nCoV coronavirus is a worldwide public health threat. The appearance in China at the end of 2019 of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; previously provisionally labeled as 2019 novel coronavirus or 2019-nCoV) disease (COVID-19) caused a major global outbreak and right now is a major community health issue. As of 8 March 2020, World Health Organization (WHO) data showed that more than 105 500 conrmed cases were reported in over 100 countries/regions, with > 75% of cases being detected in China and >24% of cases detected globally. COVID-19 outbreak is evolving so rapidly; therefore, the available epidemiological data are essential to direct strategies for situational awareness and intervention. Methods: This article will present a visual exploratory data analysis (V-EDA) approach to collect and analyze COVID-19 data on epidemiological outbreaks. Various open data sources on the outbreak of COVID-19 provided by the World Health Organization (WHO), the Chinese Center for Disease Control and Prevention (CDC), the National Health Commission (NHC), Johns Hopkins University Interactive Dashboard and DXY.cn have been used in this research. Results: Therefore, an Exploratory Data Analysis (EDA) with visualizations has been designed and developed in order to understand the number of different cases reported (conrmed, death, and recovered) in different provinces of China and outside of China between 22 January 2020 to 4 March 2020. Various open data sources on the outbreak of COVID-19 provided by the World Health Organization (WHO), the Chinese Center for Disease Control and Prevention (CDC), the National Health Commission (NHC), Johns Hopkins University Interactive Dashboard and DXY.cn have been used in this research. Conclusion: In all, this is extremely important to promptly spread information to understand the risks of this pandemic and begin containment activities.


Introduction
The latest outbreak of pneumonia in Wuhan, China has put the 2019 Novel Coronavirus (2019-nCoV) closely into our sight. The rst cases of pneumonia were identi ed in Wuhan City, the capital of Hubei Province, China in early December 2019 was later determined as a non-SARS novel coronavirus by the Chinese Center for Disease Control and Prevention (CDC) (Huang et al. 2020). Coronaviruses are nonsegmented positive-sense RNA viruses that belong to the Coronaviridae family and the Nidovirales order and are widely distributed in humans and other mammals (Ksiazek et al. 2009). The pathogen has been described as a novel enveloped RNA betacoronavirus (Lu et al. 2020) commonly referred to as severe acute respiratory coronavirus 2 syndrome (SARS-CoV-2), having a phylogenetic resemblance to SARS-CoV (Zhu et al. 2020). With a mortality rate of 10% for SARS-CoV and 37% for MERS-CoV, (WHO 2003; WHO 2019) the epidemics of the two beta coronaviruses, severe acute respiratory syndrome coronavirus (SARS-CoV), Kuiken et al. 2003) and Middle East respiratory syndrome coronavirus (MERS-CoV), (Zaki et al. 2012;Groot et al. 2013) have caused more than ten thousand cumulative cases in the last two decades. Recently the World Health Organization (WHO) declared coronavirus disease 2019  an international public health emergency. As of March 8, 2020, a total of (Con rmed global =105586) laboratory-con rmed cases have been documented globally and among them, China has (Con rmed china =80859) cases with a death rate of (Deaths china =3100) patients . The world has witnessed a rapid increase in different cases of COVID-19 caused by SARS-CoV-2 viruses in the last week or so. Although this article analyzes all the data sources till 4 March 2020, the coronavirus COVID-19 is affecting 104 countries and territories around the world and 1 international conveyance (the Diamond Princess cruise ship harbored in Yokohama, Japan) till to date. However, till 8 March 2020 several new cases were found globally including 1492 new cases were con rmed in Italy; Considering COVID-19's rapid spread, we have decided that an updated case review with outbreak analysis across worldwide and in China may help identify the epidemiological characteristics and severity of the disease. As the outbreak of COVID-19 is expanding rapidly in China and beyond, threatening to become a global pandemic, therefore we aim to describe the number of different cases caused by SARS-CoV-2 and visualize the epidemiological data through visual exploratory data analysis (EDA) approach. We hope our results from the study can inform the global community of this COVID-19 outbreak and create public awareness.

Materials And Methods
We used the Novel Corona Virus 2019 open dataset provided by Johns Hopkins University; they built an exceptional dashboard using the data of the affected cases to date (Dong et al. 2020). In addition, by providing the data accessible in google sheets format, they also provide an opportunity for the data analyst and researchers to understand the pattern of the different COVID-19 cases around the globe. Daywise knowledge about the people affected can, therefore, provide some interesting insights when it is made available to the wider data science community. This dataset provides daily-level information on the number of con rmed cases, deaths cases and recovery from 22 January 2020 to 4 March 2020. The study presented here, using the dataset given by John Hopkins University, the World Health Organization (WHO), the Chinese Center for Disease Control and Prevention (CDC), the National Health Commission (NHC), and DXY.cn, focused on exploratory data analysis (EDA) and visual exploratory data analysis (V-EDA). Nonetheless, all the datasets were pre-processed and cleaned properly and made suitable for experimentation. We have used NumPy's python-based library (https://numpy.org) and pandas (https://pandas.pydata.org) to store and analyze the data. Matplotlib (https://matplotlib.org), Plotly (https://plot.ly), Seaborn (https://seaborn.pydata.org), and Folium (https://pythonvisualization.github.io/folium/) were also used for interactive visualization of the highlighted results. All the analysis with the dataset was performed using Jupyter Notebook (https://jupyter.org) support on a Linux-based local computer platform, using Python Language in the research laboratory of Dhaka International University (DIU).

Dataset Description
We have used various dataset sources for data analysis and visualizations for this research work. Mainly, three different data sources were used including the 2019 Coronavirus dataset (January-February 2020) that monitors the spread of the 2019-nCoV,  Corona Virus Spread dataset consisting of the number of con rmed, recorded deaths and recovered cases, and the 2019 Novel Corona Virus dataset that manages the day-level information on affected cases in 2019-nCoV (Dey et al. 2020).

Exploratory Data Analysis (EDA) Approach
We analyzed the datasets using various methods of exploratory data analysis and visualization methods to provide ample evidence of the COVID-19 outbreak worldwide. Three different data sources have been used including 2019 Coronavirus dataset (January-February 2020),  Corona Virus Spread Dataset and Novel Corona Virus 2019 Datasets for exploratory data analysis. All the analysis and visual exploration are based on various datasets from 22 January 2020 to 4 March 2020. Nevertheless, as compared to the rest of the world, a large number of cases are recorded in China (Table 1). Predictably, with no surprise in China most of the reported cases are from a particular Hubei province. It is no surprise, because the capital of Hubei is Wuhan, where the rst cases are registered (Table 2). Until 4 March 2020, COVID-19 propagated 84 countries worldwide and 31 states or provinces in China. (Deaths china =2744). We also analyze time-series data using visual EDA to provide a clear and comprehensive outcome about the severity of COVID-19 outbreak. It is understandable that processing these data in real-time is extremely useful in documenting this serious disease's epidemic behavior. Our research team suggests that this way of data analysis can certainly increase situational awareness and combat strategies. All the models of data analysis and visualization we have developed for this research article, like EDA and V-EDA, are available in this URL (http://samratdey.me/analysis_covid19.html).   A visual exploratory data analysis on different datasets till 4 March 2020 provides signi cant outbreak information (C=Con rmed, D=Deaths, R=Recovered) regarding COVID-19 inside China (Figure 3). China con rmed (C=67 332) cases for the cause of SARS-CoV-2 virus between 22 January 2020 to 4 March 2020. However, the numbers of death cases (D=2871) with an increasing number of recovery growth (R= 38 557) reported within China. From the analyzed data it is evident that, the mortality rate (MR=4.26%) increases gradually in comparison with previous cases. Surprisingly, recovery rate of SARS-Cov-2 affected patients increases dramatically with a recovery rate of (RR=57.27%) till 4 March 2020.
Regarding to global concern, Figure 4 also provides a concrete idea how globally number of peoples are affected with this novel corona virus (SARS-CoV-2). Outside China there were (C=13 995) con rmed cases till 4 March 2020 which indicates a rapid increase of COVID-19 outbreak worldwide. There were (D=264) deaths with a mortality rate of (MR= 1.89%) which was much less comparing with the mortality rate of China till 4 March 2020. With a recovery rate of (RR=8.50%), around (R=1190) cases of recovered patients were reported outside china which indicates a gradually slow recovery around the globe.
For better understandings of different cases (C=Con rmed, D=Deaths, R=Recovered) we have designed and analyzed the data of each day from 22 January 2020 to 4 March 2020 within China and Outside of China. Signi cant deviation of different new cases were found on 13 February 2020, where highest number of con rmed cases were reported in different provinces in China ( Figure 5). China experienced highest number of con rmed cases till now on COVID-19 Outbreak with a (C new_case = 15 133) patients on a single day. However, it also observed a highest number of deaths (D new_case = 252) with a recovery cases of (R new_cases =1134) on 13 February 2020. All the date by date analysis and visualization results have been made available for mass access and public awareness ( http://samratdey.me/analysis_covid19.html ).
In terms of global health concern on the issue of new cases reported, on 3 March 2020 world has observed a rapid number of con rmed cases all around the globe. With almost (C new_cases_globally = 2409) new cases were reported which is treated as highest number of reported con rmed patients between 22 January 2020 to 4 March 2020. Moreover, on the very next day, highest numbers of deaths were reported on a single day worldwide with (D new_cases_globally = 58) patients. In general, an increasing trend of different cases were found from the result of the visualization model. On 4 March, 2020 (R new_cases_globally = 391) patients were recovered from SARS-CoV-2 virus around the world, which was also the highest number of patients who recovered on a single day in between 22 January 2020 to 4 March 2020 ( Figure 6). (1) South Korea (Con rmed SK = 5621), a country of Asia, also a neighboring country of China were severely affected with SARS-CoV-2 viruses. Around (Affected SK = 5545) patients were affected till 4 March 2020 which considered as the highest number of affected cases reported by any countries after the China. With a mortality rate of only (MR SK =0.62%) around (Deaths SK =35) deaths were con rmed till 4 March 2020. Apart from this, we also analyzed each single data based on different provinces of China and illustrated them in Figure 8. With no surprise Hubei has (Affected Hubei =25 904) affected patients, the greatest number of con rmed (Con rmed Hubei =65596) cases reported till date by any province in China. However, the number of death rate were also very alarming and it reaches almost to (Hu death~3 000); and till 4 March 2020 it reached to (Death Hubei = 2871) with a mortality rate of (MR Hubei = 4.37%) and a high recovery rate of (RR Hubei =58.77%). Figure 9 enlist the data of comparative analysis (Con rmed=C, Recovered=R, and Deaths=D) of Hubei, other provinces of China and the rest of the world till 4 March 2020. This representation demonstrates that Hubei has endured the largest number of infected patients (C hubei =67332). However, Hubei has also maintained a signi cant recovery rate of (R hubei =38557) patients along with the mortalities of (D hubei =2871) persons. On the other hand, rest of the provinces in China has con rmed (C other_china_province =12939) patients infected by SARS-COV-2 virus till 4 March 2020. Like Hubei, other provinces in China also showed a dramatic recovery rate of (R other_china_province =11398) patients along with con rmed deaths of (D other_china_province =110) persons. As of 4 March 2020, data from the different sources showed that there was a total of (C rest_of_world =14147) con rmed cases of COVID-19 worldwide.
Among them, (D rest_of_world =267) deaths have been reported globally with a steady recovery rate of (R rest_of_world =1206) patients. From the observation, it is apparent that there has been a steady rise in the daily total number of COVID-19 cases globally, both within and outside China till 4 March 2020.
Finally, we performed a ratio analysis for three different cases analysis including number of deaths to 100 con rmed cases, number of recovered to 100 con rmed cases and number of recovered to 1 death cases ( Figure 10). This analysis shows that during the rst few weeks of this epidemic there were more deaths reported per day than recovered cases. However, over the time this trend has changed drastically.
Although the death rate has not come down in signi cant level, the number of recovered cases has de nitely increased with the passage of time. Till 4 March 2020, we have observed 54.2% of death cases in each 100 con rmed cases, 15.75% of recovery rate of each death cases, and 3.4% of recovered to hundreds of con rmed cases.

Discussion
We report here all the different cases (con rmed, death and recovered) with laboratory test con rmed caused by SARS-CoV-2 viruses across the world and in China between 22 January 2020 to 4 March 2020.
The number of cases is rising very rapidly. As of 8 March 2020, according to the situation report Martinique, and Republic of Moldova) and overall 100 countries now have reported laboratory-con rmed cases of COVID19 in the past 24 hours. When an outbreak like this happens, readily available data and information are equally important for beginning the assessment needed to understand the risks and start containment of outbreak activities. These information includes initial reports of countries with a con rmed, death and recovered case ratio, as well as how countries outside of China are affected, how the Chinese province are struggling to deal with the COVID-19 situation and, more signi cantly, the ratio analysis of these real-world data, as well as information obtained from different regions of the world from past outbreaks. Knowledge and understanding of the consequences are needed to help improve the risk assessment as the epidemic progresses and to ensure the best care of patients is achieved. Much of this information comes up in real-time, challenging our comprehension, while improving our responses.
Currently, there is an apparent need to consider the implications of SARS-CoV-2 viruses not only in China but globally to be mindful of ourselves for days to come. Consequently, this is a small initiative of analyzing and visualizing real-world time series data in such a way that people around the globe have a better understanding of its serious nature. The undesirable occurrence of this SARS-CoV-2 virus is still being observed and, to date, 8 March 2020, the number of death cases reported was (Deaths china =3100) in China and (Deaths rest_of_china =484) death cases reported outside China. That is extremely alarming not only for China but also for the rest of the world. This research focuses on the issues of COVID-19 outbreak analysis and thus we have enlisted the most reported cases with their mortality and recovery rate in China, outside of China and in various provinces in China. We also analyzed the number of affected countries with reported con rmed, deaths and recovered cases. Besides that, we developed a visualization tool with Map view and Treemap view to examine the COVID-19 epidemiological outbreak in China and around the world.

Conclusion
COVID-19 epidemic has become a clinical threat to the general population and healthcare workers around the world. However, awareness on this novel virus (SARS-CoV-2) is still minimal. The different data sources we have used in this exploration can be useful to provide appropriate knowledge on this emerging outbreak of COVID-19. This research also investigated the mortality and recovery rate for several reported cases both in China and outside of China. We believe public authorities of each country across the globe should keep monitoring the situation in every moment. Our research team believes this is an early data analysis and visualization approach of a situation that is changing rapidly across the globe. Therefore, as the more we learn about this SARS-CoV-2 virus and its associated outbreaks, the better we can respond. We will continue to monitor the epidemiological data of COVID-19 outbreak in upcoming days using data from o cial sources.

Declarations
Funding None

Declaration of Interests
All authors declare no competing interest Ethical Approval Not required