Space-Time Cluster’s Detection and Geographical Weighted Regression Analysis of COVID-19 Mortality Impacts on Texas Counties

: As COVID-19 run rampant in high-density housing sites, it is important to use real-time data tracking the virus mobility. Emerging cluster detection analysis is a precise way of blunting the spread of COVID-19 as quickly 10 as possible and save lives. To track compliable mobility of COVID-19 on a spatial-temporal scale, this research is ap-11 propriately analyzed the disparities between spatial-temporal clusters, expectation Maximization clustering (EM) and hierarchical clustering (HC) analysis on Texas county-level. Then, based on the outcome of clustering analysis, the sensitive counties are Cottle, Stonewall, Bexar, Tarrant, Dallas, Harris, Jim hogg, and Real, corresponding to South-14 east Texas analysis in GWR modeling. The sensitive period took place in the last two quarters in 2020. We explored Postgre application to portray tracking Covid-19 trajectory. We captured 14 social, economic, and environmental 14 impact’s indices to perform Principal Component Analysis (PCA) to reduce dimensionality and minimize multicollin-17 earity. By using the PCA, we extracted five factors related to mortality of COVID-19, involved population and hospi-18 talization, age structure, natural supply, economic condition, air quality and medical care. We established the GWR model to seek the sensitive factors. The result shows that population, hospitalization, and economic condition are the sensitive factors. Those factors also triggered high increase of COVID-19 mortality. This research provides geograph-21 ical understanding and solution of controlling COVID-19, reference of implementing geographically targeted ways to track virus mobility and satisfy for the need of Emergency Operations Plan (EOP).


Introduction
The coronavirus disease 2019 (COVID- 19), as a global disaster, stopped at social-economic development worldwide 28 in 2020. It has threatened the loss of human life, public health, safety, and disruption of face-to-face communication due 29 to intangible, clinical severity of the infection and fatal symptoms [1]. By March 11 th of 2021, there are 2.62 million lost 30 their lives around the world, accounting for 15% of World War One fatality. A pervasive sense of quarantine fatigue 31 and panic attacks of getting infected are challenging human resilience [2][3]. COVID-19 is one of the extreme diseases as 32 incurable and universally fatal, killing 25%-50% of patients [4]. In particular, the COVID-19 pandemic in the U. S. was 33 exposed to mass dislocation, directly accelerating the decline and failure of public health. With around 30 million diag-34 nosed cases and over 540,000 deaths as of mid-January 2020, a disproportionate impact on COVID-19 was produced. 35 There are 40% of cases should have been averted with international cooperation of medical care [4]. In addition, age-36 specific mortality rates in the US had remained corresponding to the weighted average of G7 nations [4]. 37 Texas is the second-largest state in the U.S. and has one-tenth of the aging people. Despite unremitting Texas Exec-38 utive Orders (TEO) and Public Health Disaster Declarations (PHDD) were made, the Texas government maintained 39 economic openness. The first COVID-19 case in the United States was confirmed on January 19 th , 2020, in Washington 40 State [5], whereas the first case was announced by The Texas Department of State Health Services on March 4 th in Fort 41 Bend County. As of 02/28/2021, Texas surpasses 2,300,000 total COVID-19 cases and 372,086 deaths cases. As US has 42 gone through several waves of epidemic cycles, Texas has undergone all 5 stages of COVID-19 risk-based guidelines. 43

of 21
Texas disease surveillance and response systems have disclosed the vulnerability to deal with the global pandemic, 44 which underlines the requirement to establish global scheme, regulation and collaboration [6]. A silver lining is that the 45 pandemic provides a unique and empirical opportunity to observe a large-scale and prolonged episode of public health 46 emergency. Accordingly, it is imperative to understand the spatial-temporal clusters of COVID-19 mortality and ex-47 plored its relationships with environmental and social-economic factors. 48 A popular statistical tool to look into that relationship is space-time scan statistic, which is widely used to quantify 49 cluster strength and statistical significance [7]. Epidemic surveillance and spatiotemporal trending analysis can provide 50 unique insights for decision makers to be aware of potential uptakes and adopt proactive public health measures to 51 mitigate the risk and minimize COVID-19 infection. Detecting patterns of COVID-19 confirmed cases and mortality in 52 the United States are well documented to formulate interventions, targeted rapid testing, and resource allocation [8][9][10]. 53 However, the usefulness of space-time analysis depends on the data quality (e.g., accuracy, spatial resolution, temporal 54 currency, completeness, etc.), which are somewhat limited at the early stages of pandemic. Besides, Desjardins men- 55 tioned deaths could be conducted, but not incorporated in the research scope. Those spaces are filled in our study. The 56 distribution of the COVID-19 pandemic is well represented by GIS spatial analysis with the multidimensional social, 57 economic, and health consequences, exposing to geographical inequity and a long-term impact on global health accu-58 rately [11][12][13]. GIS-driven spatial analysis can facilitate the combination between health data and characteristic of spatial 59 attributes. Descriptive modeling research that took advantage of those strength has deeply exposed the spatial-temporal 60 associations of COVID-19 with socioeconomic and environmental characteristics [14][15]. However, as far as an engag- 61 ing empirical study, it is important to select variables that reveal the degree of social vulnerability [16][17][18]. 62 Spatial-temporal analysis of COVID-19 is crucial to understand the spread of COVID-19 and explore appropriate 63 community containment strategies, which are fundamental public health measures used to control the spread of com-64 municable diseases, including isolation and quarantine. This paper focuses on the county level within a state to elimi-65 nate the possibility of policy divergences between states, since existing research spatial-statistically calculated county-66 level data, but not temporal lag disparity of county-level [19][20][21][22][23]. Due to varying social vulnerability associated with 67 different population demographics, such as age, gender, and race/ethnicity, some population groups are more vulner-68 able in the threat of COVID-19. A few variables are presented in the previous modeling [24][25][26][27][28][29], albeit population mo-69 bility, age, race were significant factors [30][31][32][33][34][35][36]. As a respiratory disease, air pollution indices like PM2.5 and Air Quality 70 Index (AQI) are highly related to COVID-19. Despite air quality, Qian contends, is viewed as a robust interaction with 71 COVID-19 [37], AQI and PM2.5 have not been explored in previous spatial-temporal models, only added as impacts 72 factors on the environmental list [38][39][40][41][42]. 73 The research purposes are of two folds -first, to identify any emerging space-time clusters of COVID-19, and sec-74 ond, to examine any significant factors related to mortality. By exploring the spatiotemporal clusters based on a more 75 comprehensive set of data over a year-long period, this research examines the correlation between COVID-19 mortality 76 rate and social-economic, environmental factors with GWR analysis. It aims to identify sensitive indicators to assist the 77 formulation of targeted intervention suitable for vulnerable populations and break the chains of transmission. Hence, 78 this research is expected to provide references for preventing and controlling COVID-19 and related infectious diseases, 79 evidence for disease surveillance, and response systems to facilitate the appropriate uptake and reuse of geographical 80 data, to contribute to safeguarding Texas public health. Our long-term goal is to improve and strengthen health seam-81 less connection and surveillance system by timely dynamic monitor mechanism. 82 From a temporal study framework perspective, the study period was classified into four boxes based on the number 99 of fatalities (TF) per quarter. Quarterly statistical data are based on environmental and socio-economic indices at the 100 end of each quarter in response to COVID-19 fatalities at that time. The temporal-study framework in   For the spatial study, we explore the inter-correlations among independent variables before building the GWR 112 models. Since dependent variables must meet the assumption of a normal distribution, we have to describe their statis-113 tical characteristics and spatial autocorrelation analysis. To minimize any multicollinearity, all explanatory variables 114 are standardized and examined by Principal Component Analysis into composite factors. After that, we try to model 115 simple Ordinary Lease Square (OLS) and geographically weighted regression between variables. Finally, via model 116 comparisons, we pay more attention to their differences in spatial heterogeneity and analyze how did it happen, as 117 shown in Fig.2. In Kulldorff's scan statistic method, the first step is to determine a congruous probability model of data, then 122 compute the likelihood ratio test statistic λ(z) for each scan window z. After that, we identify primary cluster 123 candidates with the maximum λ(z), a Monte Carlo hypothesis procedure tests the statistical significance and obtains a 124 p-value [43]. On one hand, Kulldorff's method tests the null hypothesis H0 (constant probability for all areas) and the 125 alternative hypotheses H1 (the specific area z has a larger probability than outside areas) using a Poisson model [7]. For 126 a given region z, the likelihood function based on the Bernoulli model can be expressed using expression (1):

Materials and
where, μ(G) and μ(Z) are the total population of the study area and population in region Z; nG and nZ are the total 129 numbers of observed cases in the study area and in region Z; p is the probability that an incident falls in region Z, and 130 q is the probability that an incident falls in the rest of the study area. The likelihood of observing n (Z) in region z is 131 given by the function shown below: On the other hand, Kulldorff's method tests the statistical significance of the detected clusters. According to the 143 Monte Carlo simulation, the p-value is used to assess the statistical significance of the detected clusters. The Monte 144 Carlo simulation, proposed by Dwass in 1957 [44], Turnbull et al. took advantage of it at their cluster detection tests 145 [45]. In a Monte Carlo simulation, a large number of random replications can be generated under a chosen distribution 146 model, conditioned by the simulated case number as real data. In this study, the real population are used to calculate 147 each area in the Monte Carlo replication. The disease occurrence in each area are gathered from a non-homogeneous 148 Poisson distribution with mean μ(z) nG μ(G). The likelihood ratio is calculated by using the replica data and the real 149 data. Each simulated dataset has a maximum likelihood ratio and p-values. The smaller p-value and the bigger 150 likelihood ratio generates more likely cluster. The problematical propositions are reliant on scan windows with 151 predefined shapes [46]. 152 153 Two common clustering methods are partitioning clustering and hierarchical clustering. Partitioning cluster 154 analysis pinpoints clusters with similar instances after a set of unlabeled data are given. For example, expectation-155 maximization algorithm clustering (EM) conducts maximum likelihood estimation for samples in a mixture model. EM 156 utilized probability of cluster membership rather than a distance metric, and samples are not assigned to 1 cluster but 157 partially to distribution. It is common in chronic diseases clustering detection such as diabetes patients, that tend to 158 form groups that are either intersection or undependable shapes [47]. Hierarchical clustering is a method of 159 automatically seeking a hierarchy of clusters, which is a general application of DNA cluster detections. It includes 160 agglomerative clustering (i.e., bottom-up approach) and divisive clustering (i.e., top-down approach). Both EM and 161 hierarchical clustering belong to machine learning analysis. They do not dependent on the predefined window and 162 arbitrary patterns to detect clusters. 163 2.5 Selection of explanatory variables 164 To reduce the dimensionality of the dataset down to fewer explanatory variables, Principal Component Analysis 165 (PCA) is one of the common techniques to minimize multilinearity without losing the attribution of variables. PCA 166 could maintain interpretability while minimize information loss. It does so by creating new independent factors or 167 components that successively maximize variance. In the PCA procedure, a set of possibly correlated variables is 168 transformed into a set of linearly uncorrelated variables using the orthogonal transformation. The number of factors 169 extracted from PCA is less than or equal to the number of previous possibly correlated variables [48]. Owing to spatial dependence of COVID-19 spreading, the purpose of modeling MR is to figure out the external 172 triggers that took place readily. Statistical modelling is a good way to be considered to make predictions about the real 173 world via sample data. For instance, Ordinary Least Squares (OLS) is a traditional method for estimating a linear 174 regression between an dependent variable and independent variables. OLS assumptions involve the disturbances have 175 zero mean and constant variance, in addition to no correlation among explanatory variables [49]. However, 176 multicollinearity in OLS can cause bias of the model, inflate model performance and influence the reliability of the 177 outcome. Then, to mitigate multicollinearity, stepwise regression (SR) is one of common approaches to be considered. 178 SR is an automatic variable selection procedure that selects the most related candidate(s) among a pool of explanatory 179 variables iteratively. Forward selection begins with no variables in the model, examining each additive variable with a 180 chosen model-fit criterion until none of the remaining variables improve the model to a statistically significant extent 181 [50]. In this study, SR is disregarded due to biased R-square or coefficient [51]. The GWR modeling is initially took into 182 account of the geographical disproportion of the number of deaths [52]. More importantly, compared to OLS models, 183 GWR models are local linear regression models. They embrace the calculation of a parameter estimate of variations over 184 space in the link between independent and dependent variables [53][54]  The GWR procedure is founded upon two conditions. First, similarities between more adjacent geographical enti-187 ties exist based on the first law of geography [55]. Second, there are disproportionate distribution of explanatory varia-188 bles (e.g., socioeconomic factors) in different regions, due to spatial autocorrelation and spatial heterogeneity. Based 189 on Foster's spatial varying parameter regression, a Geographically Weighted Regression model (GWR) is localized 190 6 of 21 through weighting each observation in the dataset [54]. It, pointed out by Fotheringham, was used local smooth pro-191 cessing to address the spatial heterogeneity. Under the consideration of spatial disparity, geographic coordinates and 192 core functions are utilized to carry out local regression estimation on adjacent individuals of each group. The equation 193 of the GWR fitted model is as follows [55].

Expectation-Maximization clustering and hierarchical clustering analysis
Where i denotes the individual sample; (ui, vi) is the coordinates of sample i; βk(ui, vi) is the kth regression parame-196 ter of sample i; yi is the dependent variable of sample i, xk, i is the kth independent variable for the sample i, εi is ran-197 dom error term which obeys normal distribution when the variance is a constant, thus the parameter estimation value 198 of sample i is given by: where W is the spatial weight matrix, whose selection and setting are the core issues of GWR regression. The cal-201 culation of GWR coefficients consists of two major steps -first by selecting a proper kernel function to express the spa-202 tial relationship between the observed units. Specifically, four major kernel functions are being used in existing re-203 search, namely fixed Gaussian, fixed Bi-square, adaptive Bi-square, and adaptive Gaussian. Since the merits of a ker-204 nel function play a direct and decisive role in obtaining the most accurate possible regression parameter estimation of 205 spatial heterogeneity, after careful analysis and comparison, fixed Gaussian was chosen as the kernel function in the 206 paper, which is expressed as, where wij represents the distance weight from sample i to sample j; dij is the Euclidean distance between sample I 209 and sample j; θ is the bandwidth, which determines the speed at which the spatial weight attenuates with distance.

210
The second step of spatial weight matrix calculation is the selection of optimal bandwidth which could contribute to a 211 higher fitting degree. According to the GWR4.09 User Manual [56], bandwidth selection criteria include AIC (Akaike 212 Information Criterion), AICc (small sample bias-corrected AIC), BIC, and CV (Cross Validation).  Table 218 2). The bigger cluster incorporates 172 counties of 13,085,347 population and 12,761 new cases, covering the northern 219 and western Texas. During the period of 2020/11/6-2021/2/5, this cluster observed COVID-19 cases that were 2.48 220 times more than expected cases. The second cluster centers around East Texas and involves 27 counties with 221 26,217,888 population and 3,635 new cases during 2020/7/6-2020/9/5. This eastern cluster has an observed/expected 222 ratio of 5.23 times. It is noted, however, that this eastern cluster took place during the earlier stage of the pandemic 223 when the COVID-19 cases had just started spreading in Texas and hence the expected cases were lower than the 224 northern cluster. Among the 254 counties in Texas, these two clusters occupied 199 counties. The spatial extent of 225 these clusters is too large to guide precise tracking of COVID-19 mortality. 226 Table 2 Cluster comparison Focusing on the temporal trend, November 2020 is the most serious month during the 3 months space-time cluster 230 in the northern and western Texas (Fig 4). According to the above Fig.4, the highest month of the proportion of ob-231 served/ expected cases is shown in November 2020. Hence, the cluster period is confirmed in the last two quarters of 232 2020 and the first quarter of 2021, and the cluster's locations were covered 199 counties, which is the key of the follow-233 ing GWR analysis.  Table 3 The EM clustering and HC clustering analysis Table   243  Based on the above analysis, normal distribution was conducted on two clusters in the last two quarters of 2020 263 and the first quarter of 2021. The request for normal distribution has two conditions. One is uncertain variable is sym-264 metric about the mean, another is that uncertain variable is more likely to be in the vicinity of the mean than far away. 265 After the logarithm transformation, MR is qualified. 266

Correlation 267
According to table 4, in the third quarter, MR is positive significant to annual income and the population older 268 than 80, but negatively significant to temperature, precipitation, total hospital beds, population density, total popula-269 tion, black population, and the age groups between 20-59. In the fourth quarter of 2020, MR is negatively significant to 270 temperature, precipitation, total hospital beds, population density, total population, annual incomes, and the popula-271 tion between 20-59, while it is positively significant to population older than 80. Interestingly, annual income began as 272 positively related to MR but then negatively related to MR. 273  Through PCA, the dataset was examined using Kaiser-Meyer-Olkin (KMO) and Bartlett's Test of Sphericity. The 278 KMO test compares the correlation statistics to identify if the variables include sufficient differences to extract unique 279 factors. A KMO value of 0.616 for 14 explanatory variables is more than the threshold value of 0.5, The Bartlett's Test of 280 Sphericity (BTS) value of 0.0 was significant (p<0.001), validating that correlation between variables does exist in the 281 population. Communality is a common variance between 0 and 1, using the remaining variables as factors, was used to 282 determine if any variables should be excluded from the factor analysis (Table 5). A 0.7 threshold is used to determine 283 the significance of explanatory variables. 284 PCA was conducted as the factor analysis method in this paper. Using an eigenvalue threshold greater than 1.0, 5 285 factors are identified that could explain a cumulative 70.18% of the variance within the data model (table 6). A varimax 286 rotation was used to assist in the interpretation of the PCA analysis. The rotated component matrix was examined for 287 variables with a cutoff threshold of 0.7. Table 6 gave us the direct relationship between factors and explanatory variables. 288 The first factor, in three quarters, represents high loading on variables related to Care Beds, Total Population, Popula-289 tion Density, indicating the COVID-19 mortality rate is positively related to hospitalization and total population. That 290 means the metric of population and the index of medical care are two main indicators of COVID-19. Factor 2 in the third 291 quarter of 2020, factor 4 in the first quarter of 2021andfactor 4 in the fourth quarter of 2020 were a composite adult popu-292 lation index related to the population between 20-59 and beyond 80Factor 3 in two quarters of 2020 and factor 2 in the 293 first quarter represent natural supply index, which related to land area and precipitation, indicating keeping social dis-294 tancing was helpful to mitigate MR. The economic condition indexes include Factor 4 in the third quarter, factor 2 in the 295 fourth quarter, and factor 5 in the first quarter in 2021through household income and unemployment. Factor 5 in the 296 10 of 21 third quarter of 2020 and factor 3 in the first quarter of 2021were an environmental index, Meanwhile, factor 5 in the 297 fourth quarter (i.e., beds per capital), was the medical supply index, positively affecting MR.  The OLS regression examines whether there is a linear relationship between cumulative case and its factors, as well 309 as between death rate and its factors. By the T-test and F-test, all factors significant. By binning MR by quarter, an 310 iterative approach of GWR is conducted to examine how the spatial relationship between MR and its factors change 311 over time. Since MR is clustered and an adaptive kernel in GWR models is adopted. The AICc method would choose 312 the bandwidth which minimizes the AICc value -the AICc is the corrected Akaike Information Criterion (it has a cor-313 rection for small sample sizes). By comparing the results (Table 7), the AICc value is decreased from 875.23 in the OLS 314 model to 851.54 in the GWR in the third quarter of 2020, whereas R 2 increased from 0.17 in the OLS model to 0.37 in the 315 GWR models of two quarters. As these two models represent a global and a local approach respectively, the neighbors 316 are declined from 254 neighbors in the OLS models to 128 neighbors in the GWR models. In Q4 2020, the same trend of 317 AICc decrease is observed from 665.44 in the OLS model to 653.85 in the GWR, and R 2 increased from 0.10 in the OLS 318 model to 0.20 in the GWR model. In three times, the GWR model enjoyed higher predictive power than OLS and its 319 hence superior. Despite the GWR model remained moderately weak in modeling MR, the models are significant.  Based on existing research, COVID-19 quarterly GWR models are also implemented in the research area [54][55]. 325 Figure 6 incorporates Texas spatiotemporal distribution maps based on 5 factors in terms of 5 aspects in three quar- 326 ters. 327 In the third quarter of 2020, factor 1 among 5 factors is the dominant effect on MR due to the maximum range of 328 coefficient is -0.15 to 0.04. It is the lowest impact in central Texas thanks to the coefficient range of -2.14 to -1.73, imply-329 ing the hospitalization capacity has not been stressed beyond full capacity. Therefore, when looking at Factor 1 in the 330 third quarter, all Texas counties were in the negative range which was good. For Factor 2, a high score reflects more In the fourth quarter of 2020, factor 1 among 5 factors is not the dominant effect on MR without the range of maxi-347 mum coefficient is -0.43--0.12. That means the hospitalization capacity has not been stressed beyond full capacity. Fac-348 tor 2 is an economic composite index that coefficient is from range -0.21--0.14 to range 0.13-0.17. The central TX be-349 came the divide with neutral relationship in this factor, but western TX remained negative but eastern TX became pos-350 itive. Factor 3 is a natural supply index that coefficient is from range -0.41-0.31 to range 0.02-0.08. In northern Texas, 351 the land area is little driven COVID-19 MR, but it reversely works on South and West Texas. That indicates spatial 352 distancing is more available for South and West Texas than northern Texas. Factor 4 is adult population index the co-353 efficient is moved from range -0.31--0.29 to range -0.14--0.1. A negative relationship with MR indicates lower mortality 354 in younger pop (but also higher mortality in elderly pop). This negative association was the strongest in South and 355 West TX but weakest in the northern TX. Factor 5 is the medical supply index that coefficient is from range -0.04--0.03 356 to range 0.21-0.24. Higher BPC was supposed to have lower MR in general. Nevertheless, there were only very few TX 357 counties had slightly negative coefficients, but most in positive. This indicates that by Q4, MR still went up despite 358 higher BPC. 359 In the first quarter of 2021, Factor 1 that coefficient is from range -2.14--1.73 to range -0.15--0.04 is negative related 360 deaths all across TX based on negative coefficients. Factor 2 becomes positive precipitation and negative Land area, 361 and it is negatively related to death across TX due to negative coefficients. That means the higher the precipitation or 362 less land area, the less death. This is a bit counter-intuitive. Factor 3 that coefficient is from range -0.52--0.24 to range 0. 363 3-0.45 is an environmental factor of positive temperature and AQI. A positive relationship death means the higher 364 temp and the poorer air quality caused more death, or colder temperature/better AQI caused less death. A negative 365 relationship is the opposite. It is negative in central to west TX, but positive in the eastern TX. Factor 4 is the adult 366 population. It is all negative in the western TX but positive in the South TX. Factor 5 is the poor economic condition. 367 The positive relationship indicates that the poor economic condition is affecting in the West, South-east, and the cen-  COVID-19 virus runs rampant in high-density housing sites such as nursing homes. Emergent cluster detection is 402 a precise way of tracking the virus. In this study, we explored three types of clustering analysis methods. A space-403 time cluster's detection of COVID-19 mortality rate is built on Kulldorff's scan statistic method, which is the most pop-404 ular in the epidemiology application. What we did first is to test the null hypothesis H0 (constant probability for all 405 areas) and the alternative hypotheses H1(the specific area z has a larger probability than outside areas) using a Poisson 406 model. Then we calculated the maximum likelihood and p-value, based on a given region z. Two clusters were pointed 407 out that the sensitive period was July-September and November 2020-February,2021, referring to 199 counties. To nar-408 row the tracking area, we used EM and HC clustering to further seek much better clusters. EM algorithm assists in 409 finding out seven smaller clusters in the last quarter. HC clustering analysis directly pinpointed eight counties as a 410 significant cluster. In fact, if the COVID-19 case data were available at street or neighborhood level, meaning the address 411 of individual death could be better captured, specific hotspot of neighborhood or even building could be identified via 412 GIS. HC and EM clustering provide richer descriptions of clustering structures than traditional cluster detections. Im-413 portantly, they facilitate the realization of tracing the trajectory of individual cases based on reality. For example, there 414 is a death case at Pioneer Lodge Motel in Zion National Park in Hays county in Texas. We use ST_Buffer to build a 100-415 meter quarantine area around the building of Pioneer Lodge in PostGIS software in Fig 6. Next, the intersection area is 416 selected around Pioneer Lodge Motel in Zion National Park. Finally, it is easy to use ST_Area command to find out 10 417 of the biggest building at the intersection area. The blue squares are identified as suspected buildings with high-density 418 connections. Due to confidential COVID_19 patient information, our research does not incorporate patient addresses. 419 The below figure aims to explain the possibility of the implementation of tracking the virus based on geographical 420 cluster detection. The purpose of GWR modeling is to find out related COVID-19 factors. That is not only because the source of 424 COVID-19 is still a puzzle, but also because there may be a causality hidden within the correlation. In the GWR model, 425 COVID-19 mortality rate analysis, the research period is locked at the last two quarters in 2020 according to the previous 426 clustering analysis. we examined the inclusion of race, temperature, air quality, precipitation, hospitalization, age struc-427 ture 14 variables. Furthermore, the principal component analysis (PCA) has integrated five factors related to mortality 428 rate, including total population and hospitalization, medical supply, age structure, air quality, and economic condition. 429 Explanatory variables are highly significant to the corresponding factors as well in Table 6. Lastly, by defining a weight 430 as the variance proportions for each variable, the GWR model disclosures sensitive factors in spatial-temporal variabil-431 ity of COVID-19 mortality rates in response to social-economic and environmental impacts in Texas counties. AQI, 432 economic condition, and adult population indexes are regarded as sensitive factors. 433 Since time series are too short to be enough considered, spatial-temporal cluster detection, EM and HC clustering 434 detection, and GWR modeling were explored to examine the imbalanced distribution of COVID-19 MR and the complex 435 relationship with its risk factors [57]. The longitudinal monitor mechanism filled the gap of geographical analysis of 436 16 of 21 COVID-19. This study has conducted some spatiotemporal analysis that provides unique insights about COVID-19. It 437 is a mix of conventional Geographical Information Systems (GIS) with the use of modeling and simulation skills [58]. 438 The sensitive area is different in clustering analysis and GWR modeling due to different distribution. In cluster 439 analysis, the sensitive areas are located at Cottle, Stonewall, Bexar, Tarrant, Dallas, Harris, Jim hogg, and Real eight 440 counties, corresponding to south-east Texas. Their distinction is from different mathematical distributions. Clustering 441 methods are used by Poisson regression analysis while Gaussian distribution is applied in GRW modeling. In spatial 442 epidemiology, mortality using a Poisson process is more appropriate than a linear scale, which the GWR is. Specifically, 443 the Poisson regression identifies the relative risk of mortality linked with a given exposure that can represent a risk rise 444 with some percent. Thus, clustering detection is more accurate than GWR in the forecast of mortality region [59]. 445 Referring back to the current study, the first strength is that its performance geographically targeted ways to blunt 446 the spread of COVID-19 as quickly as possible and save lives. Through the comparison between objective clustering 447 techniques and traditional space-time cluster detection, we achieve an improved cluster solution. HC algorithm clus-448 tering method tracked one cluster with eight counties in the last quarter and one cluster with four counties in the third 449 quarter, EM clustering analysis captured seven clusters in the last quarter and no cluster in the third quarter, instead of 450 two large scale clusters in the space-time cluster methods. The second strength is the possibility of modeling GWR on 451 the PCA outcomes, which improved the robustness of findings based on OLS results. Furthermore, the combination of 452 clustering analysis and Postgre application can provide instant information that helps decision-makers and public 453 health professionals to take immediate action to inhibit current disease spread and to save lives in the future. In addi-454 tion, quick position determination can blunt the avenue of the virus spreading and save resources (time and lives). This research just focuses on the Texas Covid-19 scenario, the application of research cannot extrapolate to other 458 states. We didn't capture chronic disease data to supports this research. As explanatory variables, they should be incor-459 porated in future studies, although are excited to see clinical characteristics [62] and cardiovascular conditions impacts 460 on COVID-19 health outcomes [63]. Collecting data of multiple dimensions might improve and enrich spatial variability 461 findings of COVID-19. This research merely intended to spatial-temporal quarterly GWR models, yet there is a distance 462 to be reached for daily dynamic GWR models. GTWR or more effective spatial-temporal models should be further 463 researched in the future. COVID-19 virus spreading relies on intangible person's mobility and social activities [64]. Due 464 to dynamic and complicated people's behavior, this research is fragmentation in the constantly dynamic mobility, and 465 traced people's trajectory with stationary geographical location. Clustering analysis is not only limited to the geograph-466 ical field but also should be reached in other fields such as biological subjects. For instance, a multiple sequence align-467 ment is explored by clustering analysis, rather than using clustalW2 tools, which aims to DNA or protein multiple se-468 quence alignment program for proteins [65]. ism. It also exposed the weakness of conservative liberalism in the U.S, which is hard to unify ideology in social crisis 472 and flourish in a consistent manner [61]. This research will benefit geographical health divides evenly and provide 473 medical service references transparently. Inspired by [58,60], who applied and compared the performance of mul-474 tiscale GWR models across the United States for incident rates and death rates to account for the spatial variability of 475 COVID-19, spatial-temporal GWR models are considered to compare the global OLS model to disclosure different 476 change of COVID-19 cumulative case in response to social-economic and environmental variables at county-level in 477 Texas. To add spatial-temporal variability understanding of empirical COVID-19 analysis, the GWR modeling was 478 considered on space-time detection of an emerging cluster of COVID-19 MR. Therefore, the results of this study pro- 479 vide new empirical evidence to support future geographic modeling of the diseases. 480 Space-time cluster detection, HC&EM clustering analysis, and spatial-temporal geographical weighted regression 481 modeling of COVID-19 are crucial to improve the surveillance health system and enhancing recognition of emergency 482 preparedness plans for local hospital. They are beneficial for the government of Texas and CDC to make appropriate    Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.  Spatial-temporal GWR Map Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.