Early Epidemiological Features and Trends of the COVID-19 Outbreak in Southeast Asia: a Population-Level Observational Study

Background: The global outbreak of coronavirus disease 2019 (COVID-19) has been ongoing in Southeast Asia since 13 January 2020. We conducted an observational study to investigate underlying disease patterns of COVID-19 in Southeast Asia, and consequently to guide intervention strategies against the pandemic. Methods: In this population-level observational study set in Southeast Asia, we compiled a list of patients with COVID-19 (n = 925) and daily country-level case counts (n = 1346) from 13 January 2020 through 16 March 2020. All epidemiological data were extracted from ocial websites of the WHO and health authorities of each Southeast Asian country. Relevant spatiotemporal distributions, demographic characteristics, and short-term trends were assessed. Results: A total of 1,346 conrmed cases of COVID-19, with 217 (16.1%) recoveries and 18 (1.3%) deaths, were reported in Southeast Asia as of 16 March 2020. Early transmission dynamics were examined with an exponential regression model: y=0.30e 0.13x (p<0·01, adjusted R 2 = 0.96). Using this model, we predicted that the cumulative number of reported COVID-19 cases in Southeast Asia would exceed 10,000 by early April 2020. A total of 74 cities across eight countries in Southeast Asia were affected by COVID-19. Most of the conrmed cases were located in ve international metropolitan areas. Demographic analyses of the 925 conrmed cases indicated a median age of 44 years and a sex ratio of 1.25. The median age of the local patient population was signicantly higher than that of the corresponding country’s general population (p<0·01), whereas the sex ratio did not signicantly differ. Conclusions: The COVID-19 situation in Southeast Asia is unevenly geographically distributed and pessimistic in the short term. Age may play a signicant role in both the susceptibility to and outcome of infection. Real-time active surveillance and targeted intervention strategies are urgently needed to contain the pandemic.


Background
An unknown infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in Wuhan, China, in December 2019 [1]. The disease was later o cially named coronavirus disease 2019 . The World Health Organization (WHO) declared COVID-19 a pandemic on 11 March 2020 due to its rapid global spread [2].
The global outbreak of COVID-19 has been ongoing in Southeast Asia since 13 January 2020, making Southeast Asia the rst affected region outside of China. Southeast Asia (SE Asia) consists of 11 countries: Brunei, Cambodia, Indonesia, Laos, Malaysia, Myanmar, the Philippines, Singapore, Thailand, Timor-Leste, and Vietnam. As a regional unit, it not only borders China but also lies at "the crossroads of the world" due to important maritime trade routes. In the context of globalization, regional disease surveillance is essential because it contributes to the formulation of responses to such emerging infectious diseases [3]. There have been many early epidemiological analyses of COVID-19 outbreaks in different countries, but none for Southeast Asia as a whole. The purpose of this observational study is to investigate the underlying disease patterns of COVID-19 in this region, and consequently to guide pandemic intervention strategies.

Study Design
In this population-level observational study, we performed a retrospective analysis of COVID-19 early epidemiological data from all 11 countries in Southeast Asia during the period between 13 January 2020 and 16 March 2020. Primary data sources were the o cial websites of the WHO and the public health authorities (such as the Ministries of Health or Centers for Disease Control) in relevant countries. We included individuals with a positive polymerase chain reaction (PCR) test for SARS-CoV-2 (n = 1346). An Excel spreadsheet database was created through data compilation and used for analyses. First, we illustrated the temporal and spatial distributions of the COVID-19 outbreak in Southeast Asia. Then, shortterm predictions about cumulative case counts were made based on the temporal distributions. Finally, we described the demographic characteristics of con rmed patients and compared them with the corresponding country's general population.

Data Compilation
We closely monitored updates from press releases and situation reports on COVID-19 issued by each Southeast Asian country's health authorities and the WHO between 13 January 2020 and 16 March 2020.
Using a structured information form, our multilingual team directly and in real-time extracted epidemiological data that included daily case counts, outbreak maps, and basic demographic characteristics such as age, sex and nationality. We only compiled individual-level data for 925 cases that tested positive for COVID-19 since health authorities in Malaysia and Indonesia did not disclose relevant details since 13 March 2020, which resulted in a reduction in the sample size for demographic analysis.
The median ages and sex ratios of the general populations were extracted from World Population Prospects 2019 of the United Nations and World Bank Open Data. After cross-checking, all extracted data were entered into an Excel spreadsheet database for further quantitative analysis.

Statistical Analysis
Descriptive statistical methods were used to analyze the spatiotemporal and population distributions of COVID-19 in Southeast Asia. An epidemic curve and semi-logarithmic line graph were constructed by the report date. The spatial distribution of con rmed cases was illustrated with marked maps. We also assessed the age, sex, and nationality of individuals with COVID-19 and those who died of it. Demographic data were expressed as median (interquartile range, IQR) or n (%), as appropriate. Crude recovery or fatality rates were calculated based on the reported cumulative counts. Paired t-tests and Mann-Whitney U tests were used to compare median ages and sex ratios between patients and general populations, as well as to make comparisons between deceased and surviving cases. An exponential regression model was constructed to estimate short-term incidence trends, and we subsequently tested its statistical signi cance and goodness-of-t.
Hypothesis testing, analyses, and model building were performed using SPSS statistical software version 26.0 (IBM Corp., Armonk, NY, USA). The spatial distribution was mapped using R software version 3.6.2 (R Foundation for Statistical Computing). A P-value < 0.05 was considered to be statistically signi cant.

Ethical Approval
The ethical approval or individual consent was not applicable.

Results
As of 16 March 2020, 1,346 con rmed cases of COVID-19 were reported in Southeast Asia. Of these, 217 patients recovered, and 18 patients died. The crude recovery and fatality rates were 16.1% and 1.3%, respectively.

Temporal Distribution
An epidemic curve of con rmed cases (by report date) indicated that there were two distinct phases: (1) 13-29 February 2020 ( rst phase) and (2) 1-16 March 2020 (second phase). Cases of COVID-19 during the rst phase of infections were relatively mild, with only a few con rmed cases reported daily, and most were from Singapore and Thailand. However, in the second phase, the daily reported numbers of con rmed cases increased rapidly, especially in Malaysia. The highest jump in new COVID-19 infections was recorded in Malaysia on 15 March 2020, with a single-day increase of 190 new cases ( Figure 1).
A semi-logarithmic line graph of the distribution of cumulative cases over time revealed that the transmission rate of COVID-19 in Southeast Asia signi cantly increased at the population level at the beginning of March 2020. In contrast, the transmission rate in China remained stable ( Figure 2).
An exponential curve was used to characterize the temporal distribution of cumulatively reported cases in the second phase. We obtained the following regression model: y=0.30e 0.13x (y is the cumulative number of con rmed cases in the second phase and x is the number of days from the rst reported case in Southeast Asia). Analysis of variance (ANOVA) indicated that this model was statistically signi cant (F=355.48, p < 0·01), and the adjusted R 2 = 0.96. According to the model, the cumulative number of con rmed cases of COVID-19 in Southeast Asia was predicted to exceed 10,000 by early April 2020 ( Figure 3).

Spatial Distribution
By 16 March 2020, eight countries in Southeast Asia (except Laos, Myanmar and Timor-Leste) reported con rmed cases of COVID-19. Malaysia (n=553), Singapore (n=243), and Thailand (n=147) reported the highest numbers of COVID-19 infections, accounting for 70.1% of the total cases reported in Southeast Asia. Notably, Singapore had the highest number of recovered cases (n= 109) with a crude recovery rate of 44.9%. The most deaths occurred in the Philippines (n=12) and Indonesia (n=5) with crude fatality rates of 8.5% and 3.7%, respectively.
In the rst phase, 69.3% of the con rmed COVID-19 cases were primarily concentrated in two major international metropolises (Singapore and Bangkok) ( Figure 4A). Onset focus areas of COVID-19 infections expanded to other international metropolises in this region, including Manila, Kuala Lumpur and Jakarta. The number of affected cities rose to 74, giving the pandemic a "cancer metastasis-like" spatial distribution, especially in the Malay Peninsula ( Figure 4B).

Demographic Characteristics
The sample size for the demographic analysis was 925. Of these, the age of one patient from Cambodia and the sex of one patient from Indonesia were unknown because the health authorities in Cambodia and Indonesia did not publish this information. Moreover, 104 cases from Malaysia were missing values for age; these data were imputed with a stochastic simulation method based on the age distribution of con rmed cases as of 13 March 2020 issued by the Ministry of Health, Malaysia [4]. Table 1 summarizes the demographic characteristics of con rmed COVID-19 cases. Demographic analysis revealed that COVID-19 patients were primarily aged 20-69 years. This age group constituted 88.9% of the total con rmed cases in Southeast Asia. The proportion of COVID-19 cases among individuals aged > 60 years was 21.9% ( Figure 5).
The ages of individuals with COVID-19 in Southeast Asia ranged from 0.25-96 years, with a median age of 44 years. There were 514 males and 410 females, with a sex ratio of 1.25. The median ages and sex ratios for populations with con rmed COVID-19 cases (PWCC, both overall and local) and the general population (GP) in each country are presented in Figure 6A and Figure 6B, respectively. Moreover, the median age of PWCC (local nationals) was signi cantly higher than that of the corresponding GP (paired t-test; p<0·01), whereas the sex ratio did not signi cantly differ between the two population groups (paired t-test; p > 0·05).

Discussion
This study retrospectively analyzed early population-level data for the COVID-19 outbreak in Southeast Asia. Relevant spatiotemporal distributions and demographic characteristics were described for the rst time. In addition, a predictive model was successfully constructed to estimate short-term incidence trends.
Epidemic curve and semi-logarithmic line graphs consistently illustrated two distinct phases in the epidemic. The second phase began at the start of March 2020, and was characterized by a substantial increase in the number of reported cases. The sudden increase in con rmed COVID-19 cases was a consequence of mass gatherings for various events such as Sri Petaling tabligh (a Muslim religious gathering), which triggered cluster outbreaks in Malaysia [4]. It indicated that COVID-19 was entering a rapid transmission phase. WHO classi ed ve countries in Southeast Asia (Indonesia, Malaysia, Singapore, Thailand and Vietnam) as countries with local transmission on 2 March 2020 [5], and later declared the outbreak a pandemic on 11 March 2020 [2].
Epidemics typically follow the law of exponential growth in their early stages, especially for infectious diseases with a basic reproduction number (R 0 )>1.0. The R 0 for COVID-19 was estimated to be approximately 2.2 in a previous study that focused on early transmission dynamics in Wuhan, China [6].
Our study observed a similar exponential growth trend, which was applied to the prediction of a shortterm incidence trend for COVID-19 within Southeast Asia. The goodness-of-t (adjusted R 2 ) of the prediction model within the second phase of transmission was 0.96. Although the exponential growth model also well t the early epidemic patterns of COVID-19 in some other regions like Europe and Africa, the estimates of the relevant parameters in the model varied from each other. This phenomenon could be related to different climatic conditions, genetic background and su ciency of health resources (especially detection capabilities) [7,8]. Our model predicted that the cumulative COVID-19 cases in Southeast Asia would exceed 10,000 by early April 2020. In actuality, the cumulative number of COVID-19 patients in Southeast Asia was 10,153 as of 1 April 2020. This actual gure validates our prediction. Despite the effectiveness of short-term forecasts, we observed that actual case counts reported since Day 85 (6 April 2020) have gradually exceeded the lower limit of prediction. It is worth noting that Malaysia, which was the hardest hit in Southeast Asia at that time, implemented the Movement Control Order (MCO) nationwide on 18 March 2020 [9]. Thus, we speculate that the implementation of more stringent precautions may be a vital reason for the slowdown in cumulative case growth.
A geo-temporal map illustrated the spatial distribution of con rmed cases, which presented a condition similar to that of "cancer metastasis." Most of the con rmed COVID-19 cases were concentrated within several international metropolitan areas before spreading to other small-to-medium-sized cities. Several studies have demonstrated the link between transportation and the spread of COVID-19 infections. This seems to explain why metropolises, which are major international transportation hubs, were more vulnerable to an increased concentration of COVID-19 cases [10,11]. The smaller concentration of COVID-19 cases in small-to-medium sized cities may also be partially explained by the "Iceberg Phenomenon" of disease, which indicates that infections occurring in small-to-medium sized cities are often not diagnosed in a timely manner due to a lack of laboratory testing (especially in the early stages of an outbreak). Thus, subsequent increases in the number of con rmed cases in these areas might be partly due to improvement in testing access, not merely the spread from metropolises.
We were not able to directly compare the distribution of age or sex within patient groups from different countries or regions because of considerable differences in population structure. However, we determined that the median age of locally infected COVID-19 patients was signi cantly higher than that of their respective country's general populations, which indicates that age may be a signi cant risk factor for COVID-19. This reinforces the previous nding that COVID-19 seems to be uncommon in children [12,13].
In terms of sex, the reported proportion of males in con rmed COVID-19 cases in China, Italy and South Korea was 51.4%, 59.8% and 37.7%, respectively [14][15][16]. Although more than half of the con rmed cases in Southeast Asia were male, the link between sex and COVID-19 susceptibility was not supported by the result of our corresponding hypothesis test. Moreover, participation in social activities may be considered an intermediary factor [12,16]. Interestingly, the age and sex composition of con rmed cases in Vietnam were unique in that many more young and female individuals were infected with COVID-19. This may be attributed to the population structure and the role of women in Vietnamese society. Nearly 20% of con rmed COVID-19 cases in Southeast Asia included foreign nationals, but this proportion varied between countries. Diversity in the composition of patients' nationalities may suggest to some extent the risk of virus inputs. As an active center of the global community, frequent cross-border population movements increased human-to-human transmission within Southeast Asia [17]. Consequently, there is a need to give serious consideration to the rapid spread of the epidemic outside of China.
The global crude fatality rate for COVID-19 was 3.9% (6,606 deaths out of 167,515 con rmed cases) as of 16 March 2020 (the end of our study period) [18]. The crude fatality rate for Southeast Asia was largely underestimated in this study because of delayed diagnosis and lack of transparency in the information given by health authorities. For example, the number of deaths from COVID-19 in Indonesia jumped from ve on 16 March 2020 to 19 on 18 March 2020, thus increasing the crude fatality rate to 8.4% [19]. Although most COVID-19 patients may exhibit mild clinical symptoms, older people and individuals with underlying medical conditions may be at increased risk of suffering severe illness and death. Our study results are consistent with this nding [14,16]. The median age of the 18 deaths included in the study was signi cantly higher than that of surviving cases. Among COVID-19 patients who died, 72.2% had underlying conditions, such as diabetes and chronic cardiovascular diseases, among others. Notably, one deceased case from Thailand also had dengue fever [20], a tropical disease that is common and active in Southeast Asia. It is di cult to distinguish these two viral diseases since they share some clinical and laboratory features. Public health security in this region is facing unprecedented challenges [21].
Despite our efforts to ensure data quality and analytical rigor, the present study has several limitations.
First, raw data were compiled from publicly available information, which was not equally available across the countries included in the study. Therefore, the data available for the overall analysis and sample size for demographic analysis were limited. Moreover, due to the delay in the diagnosis of COVID-19 infections and lack of transparency in the provided information, the number of COVID-19 cases and deaths may not have been comprehensively reported during the early phase of transmission. This may have resulted in an underestimation of the true severity of the outbreak in this region. Finally, evolving health policies and opportunistic factors make it di cult to predict pandemic trends. We used a simple and practical model to make short-term predictions of COVID-19 incidence trends in the study region.

Conclusions
This study was the rst to describe the early epidemiological features and trends of the COVID-19 outbreak in Southeast Asia from a regional perspective. Analysis of spatiotemporal distribution characteristics indicated that the region's COVID-19 situation was unevenly geographically distributed and pessimistic in the short term. Advanced age may play a signi cant role in increasing susceptibility to COVID-19 infection and lead to severe clinical outcomes. Consequently, there is an urgent need to implement real-time active surveillance and develop targeted intervention strategies for combatting the pandemic.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. MZ and LL had the original idea and developed the study protocol. MZ, KJ, SPC, JWHT and NL were involved with the data collection and collation. MZ, JS, KJ and LL performed the analysis and drafted the initial manuscript. All authors contributed in the preparation of manuscript writing, and approved its nal version. Figure 1 Epidemic curve of con rmed COVID-19 in Southeast Asia, by date of report and country from 13 January to 16 March 2020. Red dashed line: watershed (on 1 March) between the rst and second phases.    Age distribution of con rmed COVID-19 cases in Southeast Asia. Red dashed lines: upper and lower limits of the age group in which 88.9% of cases were concentrated. Figure 6