This is the first “big data” study to explore the transmission chains and epidemiological characteristics of COVID-19 in Gansu Province. We found no differences in gender between the patients with COVID-19 and other patients attending hospital in the same period, covering the duration of the estimated maximum incubation period of SARS-CoV-2 starting from 23 January 2020 when the first travel restrictions took place. Most infections in Gansu Province were imported from Wuhan. The most common suspected mode of transmission was through family clusters, and we found no clear reports of other modes of transmission.
4.1 Gender, age and aggregation
More women than men were diagnosed with COVID-19 in Gansu Province, although women formed also the majority of all patients seeking care in hospitals within the province. This finding is inconsistent with the situation at the beginning of the outbreak, men accounting for 56% of the first 425 patients diagnosed in Wuhan (16). A retrospective, single-center study published on February 15, 2020 on 99 cases in Wuhan, also indicated that men were at higher risk than women. The proposed reduced susceptibility of females to viral infections could be attributed to the protection from the X chromosome and sex hormones, which play an important role in innate and adaptive immunity (17,18). However, there is no clear evidence yet for the higher susceptibility of women than men.
Although patients with confirmed COVID-19 tend to be younger than patients seeking care for other reasons, the age range of confirmed cases of COVID-19 in Gansu Province covered almost all age groups. This is in line with previous research findings. People of all ages have been shown to be susceptible to SARS-CoV-2, and are thus at risk of acquiring the infection as long as the conditions necessary for transmission are met (5,19). An analysis of 4021 confirmed cases in China also showed that people of all ages are generally susceptible: 71.5% of the patients were aged 30 to 65 years, and 0.4% were children under the age of 10 years (20). However, the risk of acquiring SARS-CoV-2 may be increased in the elderly and people with chronic underlying diseases such as asthma, diabetes and heart disease (19).
At the early stage of the epidemic, cases of COVID-19 were mainly sporadic. The proportion of clustered epidemics in various locations has continued to increase, which has also changed the development of the epidemic and the sources of exposure. The number of cases linked to clustered epidemics is estimated to account for 50% to 80% of all confirmed cases in several provinces and cities including Beijing, Shanghai, Jiangsu, and Shandong (19). The results of this study, covering a period of 14 days which is the estimated maximal incubation period, show that although no super spreaders were found, many cases were clustered in families or neighborhoods. This shows that there was ongoing human-to-human transmission also in Gansu Province. These characteristic related to clustering and sources of infection are consistent with reports from Shanxi, Chongqing, and other provinces and cities (21,22). The incubation period of SARS-CoV-2 is generally 4 to 7 days, and a large number of suspected patients and asymptomatic infections become the main source of infection (18,19). Therefore, it is important to track people with asymptomatic infections, and block the occurrence of familial and spacious aggregation. Gansu Province National Health Information Platform is linked with the entire population information database. Such data sources give new opportunities to accurately delineate the group of people with closest contacts and realize early warning to prevent family aggregation of the disease.
4.2 The potential of big data on prevention and control
As a new infectious disease, the spread of COVID-19 was accelerated at the beginning of the epidemic by the delayed in diagnosis, treatment, and epidemic management due to lack of awareness. China’s traditional surveillance network mainly relies on reporting and summarizing the situation, which is however too slow to meet the need for rapid response to the epidemic. Big data has a huge potential to help to follow, control and respond to epidemics rapidly. The use of information technology and big data as an effective auxiliary method for epidemiological investigations can not only achieve early detection, early reporting, early isolation, and early treatment of cases, but also quickly map out the current status of the disease, grasp the patients' past medical history, and help to track the sources of infection and control the epidemic. As the big data network allows almost real-time disease monitoring, this comprehensive and rapid surveillance method will make public health surveillance more sensitive, especially to trace the unconscious close contacts and provide the necessary control measures to prevent further infections (23).
Big data in the field of medicine and public health has become one of the most important medical resources, and has played an active role in the prevention and control of COVID-19 in Gansu Province. In particular, it has provided great support for the management of source of infection and the development of epidemiological investigations. In the next step of prevention and control, Gansu Province should continue to strengthen the innovation construction of "big data + epidemiology" (23), prevent the recurrence of imported cases and cluster epidemics, and continuously improve the construction and promotion of big data related platforms, so as to provide a theoretical basis for facing related emergencies to facilitate scientific epidemic prevention and decision-making in the future.