Changing Clusters of Indian States with respect to number of Cases of COVID-19 using incrementalKMN Method

The novel Coronavirus (COVID-19) incidence in India is currently experiencing exponential rise but with apparent spatial variation in growth rate and doubling time rate. We classify the states into five clusters with low to the high-risk category and study how the different states moved from one cluster to the other since the onset of the first case on $30^{th}$ January 2020 till the end of unlock 1 that is $30^{th}$ June 2020. We have implemented a new clustering technique called the incrementalKMN (Prasad, R. K., Sarmah, R., Chakraborty, S.(2019))


Introduction
A severe acute respiratory disease, caused by a novel coronavirus,  has spread in the month of November-December 2019 throughout China and received worldwide attention. The World Health Organization (WHO) officially declared the novel coronavirus (COVID-19) epidemic on 30 th January 2020 as a public health emergency of international concern. In India, the first case of novel coronavirus  was detected on 30 th January in the State of Kerala (Ward, A. (2020, March 24)). As the number of confirmed novel coronavirus positive cases closed 500, the Govt. of India introduced "Janta Curfew" on 19 th March 2020 and after that Govt. of India enforced 21 days lockdown Phase-I nationwide from 25 th March -14 th April, nearly all services and factories were suspended (Singh, K. D., Goel, V., Kumar, H., Gettleman, J. (2020, March 25)). Due to the number of confirmed cases of COVID-19 increased, on 14 th April 2020, Govt. of India extended the lockdown Phase-I period till 3 rd May 2020 i.e. lockdown Phase-II, with certain relaxations (Bhaskar, U. (2020, April 14), Dutta, P. K. (2020, April 14)). In 2 Methodology

Clustering
In order to define data clustering, let D = {x 1 , x 2 , . . . . . . ., x n }, be a data set with n number of data elements, and each data element characterized with m number of features : x i = {x i,1 , x i,2 , . . . . . . ..x i,m }. The main objective of clustering is to group these data elements into homogeneous sub-groups such that the intra-cluster similarities are high while inter-cluster similarities are low. The data elements in each sub-group are called a cluster, and the union of all sub-groups is equal to the dataset D. Clustering methods have been classified into five different categories, i.e., Partition-based, density-based, hierarchical-based, grid-based and model-Based (Han, Jiawei, Jian Pei, and Micheline Kamber (2011)). In the last few decades, clustering algorithms have been extensively used to solve the problem of data-mining. In this research, we have used incrementalKMN (Prasad, R. K., Sarmah, R., Chakraborty, S.,(2019)) clustering method on novel coronavirus (COVID-19) data set of India based on confirmed cases, which produced the k desired number of group of states of India i.e. divide the data set into k number of group of states. And apply the growth rate and doubling time from equation 2. 1 and equation 2. 2 on desired k number of clusters produced by incrementalKMN (Prasad, R. K., Sarmah, R., Chakraborty, S.,(2019)) clustering method. The steps of incrementalKMN method is given below: Step 1: Select the value of k and dataset D Step 2: Set i = 1 and C = φ, where C is empty centroid list.
Step 3: Select first centroid i.e. c i as mean of a given dataset D.
Step 4: Update the centroid list C = C ∪ {c i }.
Step 5: Assign each data objects to its nearest centroid.
Step 6: Compute the SSE of each cluster.
The i th center is a maximum distance from the data object and the centroid of maximum SSE cluster.
Step 8: Repeat the step 4 until it reaches k number of centroids and finally formed k number of clusters.
2.2 Compound Growth Rate (Murphy, C. B. (2020, May 15)): In this paper, we have used compound growth rate over regular time intervals of confirmed case for each state phase wise. The growth rate of each state is computed as: Growth rate = P resent Conf irmed Case P ast Conf irmed Case where, P resent Conf irmed Case is a ending value, P ast Conf irmed Case is a beginning value and n is a the number of periods(in days).
2.3 Doubling Time (Manias, M. (2020, January 10)): Doubling time is a time it takes for a confirmed case to double in size. The doubling time of confirmed case for each cluster is computed with the help of equation 2. 1, which is described as: Doubling T ime = ln(2) ln(1 + Growth rate) (2. 2) The complete flowchart of the methodology adopted is given in Fig 1. In Fig 1, the proposed method start with input data set D and value of k, in this paper, we have considered the value of k is five. In the next step, we have applied the incrementalKMN clustering method using data set D and value of k that produced k number of clusters. In the next phase of the flowchart, it computes the growth rate and doubling time of clusters produced by the incrementalKMN clustering method.

Data Source
The data set of novel coronavirus (COVID-19) daily confirmed cases state-wise is collected form (https://api.covid19india.org/).In this data set, the total number of confirmed cases in India was 4,14,970 from 30 th January to 20 th June 2020. In novel coronavirus dataset, we have considered the following states and UTs, as AN (

Result and Discussion
We have considered five different clusters (subgroups) of states namely (i) high risk, (ii) moderate-high, (iii) moderate, (iv) moderate-low, and (v) low-risk states with respect to the daily incidence of COVID-19 positive cases. As such the value of k is taken as five in

Scenario of novel coronavirus(COVID-19) in India Before
Lockdown: The first confirmed case of novel coronavirus in India was reported on 30 th January 2020 in the state of Kerala (Ward, A. (2020, March 24)). The number of confirmed COVID-19 positive cases reached close to 500 on 19 th March 2020 and by 24 th March 2020 that is before lockdown the positive case reached 658. In Fig 2 which depicts the situation prior to the start of the lockdown based on the number of confirmed cases, the states {MH and KL} were on the high-risk state, {HR and UP} were on moderate-high risk state, Union Territory DL and the state KA were on moderate risk state, {GJ, LA, PB, RJ, TG} were on moderate-low risk states and the remaining states and UTs were on the low-risk states.
In Table 1, the growth rate and doubling time(in days) of each cluster of states are shown. The growth rate of clusters I, II, and III states i.e. {GJ, LA, PB, RJ, TG}, {KL, MH}, {DL, KA} were approximately same and doubling time was 3-4 days (approximate). Whereas, the growth rate of cluster IV i.e. the states {HR and UP} were much lees that the top 3 cluster but close to that of cluster V, and doubling time of 9 days(approximately)was nearly 3 time that of the top clusters. The rest of the states and UTs before lockdown, the growth rate was low, and the doubling time was approximately 10-11 Days.

Scenario of Novel coronavirus(COVID-19) in India Lock-
down Phase-I:    Table 2. Accordingly, the growth rate of the state of MH before the lockdown was high and in lockdown phase-I, the growth rate decreased considerably from 29% to 15%. Similarly, the 3 days doubling time before the lock down climbed up to 5 days after lockdown phase-I. On the contrary, the average growth rate of cluster II i.e. {DL, TN} has increased, and doubling time in lockdown phase-I was decreased as compared to before lockdown. Similarly, the average growth rate and doubling time of cluster III and IV i.e. states {AP, GJ, KL, MP, RJ, TG, UP} and states {HR, JK, KA, PB, WB} have improved in lockdown phase-I. But in cluster V i.e. states and UTs {AN, AR, AS, BR, CH, CT, DN, GA, HP, JH, LA, MN, ML, MZ, NL, OR, PY, SK, TR, UT}, the average growth rate has increased and doubling time also decreased in lockdown phase-I as compared to before lockdown.

Scenario of Novel coronavirus(COVID-19) in India Lock down Phase-II:
In phase-II, the lockdown was extended nationwide up to 3 rd May 2020 with certain relaxations (Bhaskar, U. (2020, April 14), Dutta, P. K. (2020, April 14)). In Fig 4  based Table 3 shows the growth of the rate and doubling time of each cluster of states and UTs. The growth rate for both the cluster I that is {MH} and cluster II i.e. {DL and GJ} reported as 8% is a substantial decrease from their 15% and 17% in lockdown phase-I

Scenario of Novel coronavirus(COVID-19) in India Lockdown Phase-III:
Third phase of lockdown started from 4 th -17 th May,2020 with some more relaxations ( Online, E. T. (2020, May 4), newsworld24. (2020, May 2)). The country was divided into 3 zones: red zones, orange zones, and green zones (Thacker, T. (2020, May 8)). In Fig 5, again the state MH was in a high-risk category, and DL, GJ, and TN formed the moderate-high risk category. The states {MP, RJ, UP} were in moderate risk, {AP, JK, KA, PB, TG, WB} were in moderate-low risk state, and the rest of the states and UTs were in the low-risk category.
In phase-III the growth rate of confirmed cases of all clusters except cluster V in lockdown have decreased and the doubling time of all clusters increased except for cluster V which shows in Table 4

Scenario of Novel coronavirus(COVID-19) in India Lockdown Phase-IV:
In phase-IV, the lockdown was extended for another two weeks from 17 th − 31 th May 2020 with some additional relaxations. Here, red zones were further divided into to containment and buffer zones ( May 21)). In Fig 6, Table 5.

Scenario of Novel coronavirus(COVID-19) in India Lockdown Phase-V:
The phase V (or Unlock-I) of lockdown started from 1 th − 30 th June 2020 with only limited restrictions (Sharma, N., Ghosh, D. (2020, May 30)) . In our study, we have considered the novel coronavirus (COVID-19) data set till 20 th June 2020. The state MH was in the high-risk state based on the number of confirmed cases and {DL, TN} was in moderate-high risk state which shows in figure 7. Similarly, the state GJ was in moderate risk state, and {MP, RJ, UP, WB} were in moderate-low risk state, and the rest of the states and UTs were in the low-risk state. From Table 6, the growth rate of all clusters has decreased or remained the same in lockdown phase-V as compared to previous phases. Similarly, the doubling time of all clusters has increased or remained the same. The doubling time of cluster I i.e. the state MH is in 24 days(approximate). Similarly, cluster II i.e. the states {DL, TN} is required 14-15 days(approximate) to double and cluster III i.e. the state GJ is required 35 days to double. The cluster IV and V have 24 days(approximate) and 9 days(approximate) to doubling time. From what we have found it was expected that some of the states/Uts lying in the Category V will soon move to the Category IV in the next weak or so starting 21 th June 2020. In order to verify this, we have then extended our study covering data up to 30 th June 2020 to cover the full Unlock I period to see how the cluster changed in the last 10 days of this phase. The result is shown in figure 9. As expected the some of the states {AP, AS, BR, HR, JK, KA, OR, PB, TG} which were in Category V moved to category IV. In fact from figure 8 where we have demarcated 6 instead of 5 clusters to reveal hidden groups with the cluster V gave a clear indication of the tendency of the above state towards the next higher risk category (see also Table 7).  ii. Lockdown seemingly had its impact on two other states HR(Haryana) and UP(Uttar Pradesh) as these states improved down to low risk and moderate low-risk category.
iii. Based on that we observed that some of the states like AP (Andhra Pradesh), AS (Assan), BR (Bihar), HR (Haryana), JK (Jammu and Kashmir), KA (Karnataka), OR (Odisha), PB (Punjab), and TG (Telangana) will move up the ladder towards higher risk level in weeks to come.
iv. Also it appears that some of the states/UTs where the onset was late might show surge in coming days to move to a higher category of risk.
v. The current study is based on the number of confirmed cases and subject to reporting biases if any in the data source. We have not considered other relevant factors, co-variate and non-pharmaceutical interventions which might completely alter the picture favorably.
vi. Number of confirmed cases alone can not give a true picture of the prevalence of the disease as it is proportional to the number of tests conducted.

Conclusion
In this study, we used incrementalKMN (Prasad, R. K., Sarmah, R., Chakraborty, S.,(2019)) clustering method to classify the Indian states and UTs in five different stages of risk on the basis of the number of confirmed cases of novel coronavirus . Then evaluated the growth rate and doubling time of confirmed cases of each cluster. As on 30 th June 2020, the state MH (Maharastra) in on high-risk category with the doubling time of confirmed cases is 23-24 days(approximate). Similarly, the union territory of DL (Delhi) and the state TN (Tamil Nadu) are in moderate-high risk state and is expected to join MH in the high-risk state soon as the doubling time of these set of states is 14-15 days(approximate). The state GJ(Gujarat) is in moderate risk sate and has decreased their growth rate during the lockdown phases and the doubling time of this state is 35 days(approximate). Moreover, the states MP (Madhya Pradesh), RJ (Rajasthan), UP (Uttar Pradesh), and WB (West Bengal) are in moderate-low risk state and the growth rate and doubling time are same as the state of MH. Rest of the states are in the low-risk cluster but some of the states namely AP (Andhra Pradesh), AS (Assam), BR (Bihar), HR (Haryana), JK (Jammu and Kashmir), KA (Karnataka), OR (Odisha), PB (Punjab), and the TG (Telangana) are expected to reach next level of risk soon. Our aim in this work is not to prove or disprove anything but present the pattern for everyone to see and realize that the situation we are in still remains critical. There is no room to rejoice now by saying we are in low risk compared to MH. Yes! MH is ahead of us but not in terms of numbers but we are behind them in time scale and it is a matter of time before we reach and experience that stage.