A Spatial temporal classi�cation analysis and visualization of tropical cyclone tracks in Bay of Bengal using GIS

Tropical cyclones (TC) are among the most devastating forms of natural hazards and the east coast of India is more prone to TC landfall causing significant socio-economic impacts. The Bay of Bengal (BoB) which forms the eastern sub basin of North Indian Ocean experiences the seasonally reversing monsoon, depression and TCs. In this study TC best track dataset of NIO basin over the period 1960 - 2016 from the IBTrACKs archive maintained by NOAA are used. In this work Firefly optimization is coupled with FCM for TC tracks classification. The classical FCM uses random initialization of cluster centroid often gets trapped in local optimal problem. The firefly algorithm is applied on the FCM for the cluster centroid computation, in this way improving the efficiency of FCM algorithm. The obtained classes are then projected in the visualization space. Visualizations are generated using the GIS environment to gain insight into the spatial distribution of TC tracks over decades. This study aims to develop a comprehensive assessment of variability in tropical cyclones with respect to ENSO modulated events, inter decadal variability and track sinuosity. In this paper we attempt to convey the cognitive results of comparative visualizations of TC tracks over Arabian Sea and Bay of Bengal sub basin during the strong, very strong El Niño and La Niña events. Finally we use Parallel Coordinate Plot (PCP) a visualization technique to demonstrate the correlation patterns of the TC parameters.


Introduction
Tropical cyclones (TCs) are extremely hazardous weather events which cause huge damage to life and property at the time of landfall and may result in heavy rainfall and inland flooding. Tropical Cyclones (TC) pose a significant threat to Indian coastal area by devastating agriculture, infrastructure and destroying thousands of homes. This paper presents the results of classification of TCs based on track shape and visual analysis of various large scale environmental features related to cyclone tracks including sea surface temperature, vertical wind shear, wind speed, pressure, latitude, longitude, category in Bay of Bengal region of NIO.
During the Northeast (NE) monsoon which coincides with the post monsoon (October -December) successive TCs are seen in Indian sea. The NE monsoon can bring damaging floods but also has a crucial role in ensuring adequate water supply for the people and the crops. Cyclogenesis is the term referring to the initial development stage of TC having mass convergence at its center. North Indian Ocean (NIO) is the breeding ground for tropical cyclones which are quite predominant during the pre-monsoon (Mar -May) and postmonsoon (Oct -Dec) periods. Although the NIO (Bay of Bengal and Arabian Sea) accounts for 7% of the world's cyclones average of 5 -6 cyclones per year) their impact is very significant in terms of detrimental aspects, especially when they make landfall on the east coast of India. Moreover, the sufficiently warm sea surface temperature (SST) of 26° C -27°C and low vertical wind shear of horizontal wind over the region of BoB favour tropical cyclogenesis activity. In this study the interdecadal variability, ENSO modulated events and the shapes of TC tracks over the BoB sub basin are investigated over the period 1960 -2016. Table 1 shows the classification of cyclonic disturbances in the NIO as per the India Meteorological Department. This study aims to develop a comprehensive assessment of variability in tropical cyclones with respect to Enso modulated events, inter decadal variability and track sinuosity.

Related Work
TCs are migratory in nature and their tracks have been subject of investigation. Deo and Garner (2014) report an increase in the intense TC activity over the north and south Indian ocean in the last 15 years. Further they analyzed the trend in energy metrics like Accumulated Cyclone Energy (ACE) and Power Dissipation Index (PDI) to find their significance in Net Tropical Cyclone (NTC) activity over NIO. Mohapatra M et al., (2011) analyzed the best track information of Cyclonic Disturbances (CD) and its impact over NIO. Bhaskar Rao et al., (2001) studied the trends and fluctuation of TCs over NIO during the period 1877 -1998. Webster et al., (1999) conducted a detailed study on strong seasonal anomalies in sea surface temperatures, precipitation and winds that occurred in the Indian Ocean region. Chu and Clark (1999) found an upward trend of dramatic decadal scale change of TC series from 1966 -1997 in the Central North Pacific. MedhaKhole (2005) examined the interannual and inter-seasonal variability of SST over North Indian Ocean. They compared the SST anomalies during the standard meteorological seasons over India i.e., Winter, Pre-monsoon, Monsoon, Post-monsoon. Barnston et al., (1997) suggested the appropriate SST indicator of the ENSO phenomenon. To identify the ENSO type two new indices -the eastern Pacific ENSO (EP) index and the central Pacific ENSO (CP) index, based on the SST conditions in Niño3 and Niño4 region respectively, are determined by Hong Li Ren (2018). K-Means algorithm is used by Rahman et al.,(2018) to cluster Indian Ocean cyclone tracks in to four clusters based on standard deviational ellipse of cyclone trajectories by observing 592 TC tracks. Cluster analysis using probabilistic clustering technique based on a regression mixture model was applied by Camargo et al., (2008) to describe the eastern North Pacific (ENP) TC tracks. Zarnani (2013) investigated the application of Fuzzy C-means clustering a powerful soft computing technique to discover uncertainity of weather patterns. Ferstl et al., (2017) introduced stacked time-cuts which provide a compact visual summary of the evolution of the clusters over time for encoding the forecast dynamics. Sanyal et al., (2010) described a tool Noodles that was developed to visualize the uncertainties associated with numerical weather prediction. The tool was designed to in order to explore the uncertainty of three important weather variables: water vapor mixing ratio, perturbation potential temperature, and perturbation pressure. Liu et al., (2019) initially constructed a selective sampling with inputs taken from a prediction ensemble of 1,000 storm tracks, which is followed by constructing a smaller representative and spatially well organized ensemble. Sanyal et al.,(2008) developed a 3D immersive visualization which depicts weakening of Hurricane Lili when a shaft of dry air moved into the Hurricane core. Zhou et al., (2008) introduced a framework to improve the visual clustering of PCP by reducing the edge clutter. Felton et al., (2013) studies the SST anomalies during the ENSO phase in the BoB region. Meehl et al., (1987)

Data
The International Best Track Archive for Climate Stewardship (IBTrACS) dataset developed by the NOAA Knap et al., (2010) has been utilized in this study for the BoB TC for the period 1960 -2016 .IBTrACS provides data in specified formats. The shapes file format with lines feature is used for analyzing the data in a GIS environment. For this study IBTrACS.NI.list.v04r00.lines version file is used. To facilitate the data analysis the csv format of tropical cyclone data for the period 1960 -2016 was selected from the IBTrACS archive. Basin.NI.ibtracs_all.v03r10.csv file is used in this study. The dataset comprises information of TC name, time stamp, position (latitude, longitude) minimum pressure, maximum wind speed which are recorded in a standard format at 6 hourly intervals in UTC.

Framework Description
In this section we present the overview of our framework which comprises of four parts. The initial point of the system is the collection of TC tracks data. The second step consists of TC feature extraction for the construction of feature vectors. This step eliminates the unimportant data and improves the classification accuracy. The third step is dedicated to Firefly optimization based FCM classifier which classifies the feature vectors into four different classes based on the shape of the track. In the fourth step the obtained classes are projected on the visualization space. Figure 1 shows the illustration of four consecutive steps of our framework.

Fuzzy C means Clustering (FCM) technique
The clustering method proposed by J.C. Dunn (1973) is used to group TC tracks into clusters based on the Sinuosity Index. This algorithm requires the optimal number of cluster as input which is assigned as c =4 based on the SI measure varying over the partition of track types straight, quasi-straight, curving and sinuous. The main objective of the FCM is to find an optimal partition of the feature space of the given cyclone track dataset. Let X = { x1, x2,.....xn} denotes feature space partitioned into c clusters where n is the number of data points. FCM is an optimization algorithm that minimizes the objective function J and is defined as Where is the Euclidean distance between i th data and j th cluster center represents the membership of i th data to the j th cluster center. The steps in FCM clustering algorithm are summarized as follows 1.
Compute the fuzzy memberships using Where represents the Euclidean distance between the i th data and j th cluster center m is the fuzzification parameter 3.
Calculate the fuzzy centers using 4. Repeat steps 2 and 3 when the object function J reaches its minimum and the convergence is achieved.

Firefly Optimization Algorithm
The population based firefly algorithm was found to provide feasible solution for optimization problems. Fireflies are glowworms that sparkle through bioluminescence. The firefly algorithm is based on the idealized behavior of the flashing characteristics of fireflies. The basic principle of firefly algorithm uses three idealized rules executed by the parameters attractiveness, randomization, and absorption.
1. All fireflies are unisexual in nature so that one firefly will be attracted by a brighter firefly despite of their sex.
2. Attractiveness is proportional to their brightness, thus for any two flashing fireflies, the less bright one will move towards the brighter one and if their distance increases, they both decreases. If there is no brighter one than a particular firefly, it will move randomly. 3. The brightness of a firefly is affected or determined by the landscape of the objective function.
The objective function value signifies the light intensity. Suppose there are n fireflies and Y i represents a solution for firefly i. The fitness value is expressed by f(Y i ) is given by The firefly will move towards firefly with high brightness. The attraction factor of the firefly is represented by β and it changes with distance (R ij ) between any two fireflies i and j, at positions Y i and Y j , respectively.
The attraction function β(R) of the firefly is expressed as follows.
Where β0 is the attraction function value for R = 0 , and γ is the coefficient of ingestion of light.

Firefly optimization based Fuzzy C Means (FOFCM)
The quality of the final clusters from the FCM algorithm is highly influenced by the initially chosen cluster centroids. The classical FCM lacks sensitivity in the cluster centroid initialization and often gets trapped in local minima. The Firefly optimization is coupled with FCM for the estimation of the optimal cluster center values for TC tracks classification. Consequently the determined cluster center values are used for the initialization of FCM algorithm. When the FCM algorithm converges the maximum degree of membership for each object into one cluster is obtained successfully and the cluster centers are stabilized.
The pseudo code for the FOFCM is as follows Generate initial population of fireflies Initialize the algorithm parameters Define objective function f(Y), Y=[y1,y2,….yd] to calculate the fitness of all fireflies Define the light absorption coefficient (γ)

Repeat
For i= 1 to n // all n fireflies For i = 1 to n Estimate the light intensity I i using object function f(Y) for each firefly if (I j < I i ) Move firefly i towards j using attraction function Endif End for j End for i Rank the fireflies, and select the best fireflies for next population Do until t < maximum number of generations Obtain final best fireflies (better cluster centers) Initialize the FCM center with final best obtained using Firefly Algorithm Then using this centers iterate FCM algorithm Repeat Update the membership function Refine the cluster centers Do until it meets the convergence

Visualizing the Spatio Temporal Characteristics of TC Tracks
This section focuses on the visualizing the TC tracks aspects to gain insights in data. This is accomplished by visualizing in two major categories: parameter visualization and ensemble visualization. The interdecadal variations and ENSO events are visualized in the parameter visualization. The ensemble visualization focuses more on multi-variate distributions to enhance the visual reasoning.

Parameter Visualization (i) Inter-decadal variations of BOB
The extensive coastline of the Indian subcontinent is devastatingly affected by the tropical cyclones and hence the analysis of inter decadal variation would help in the TC forecasting. The Bay of Bengal is bounded by the geographical coordinates 5° 44' 2.3" -24° 22' 39.6" N; 78° 53' 53.6" -95° 2' 55.8" E. It is bounded by India and SriLanka in the west, Bangladesh in the north, Myanmar Peninsula and northern part of Malay Peninsula in the east. The south of BoB is positioned between the south tip of Sri Lanka to the north tip of Sumatra. Some of the parameters that control the tropical cyclone formation are Sea Surface Temperature (SST), coriolis force, wind shear, upper air disturbances. Among these factors the two favourable conditions for the tropical cyclone formation and intensification are warm SST (higher than 26.2°C) and low vertical shear. The frequency of dissipation of the cyclonic depression is significantly high over the West Arabian Sea mainly due to the unfavourable conditions like colder SST and high vertical wind shear. Unlike AS the SST is higher in BoB(≈ 26 -27°C) due to stratification (water of different densities such as fresh water of major river discharges and rain water added into the sea) and less mixing of the stratified water. The movement of TC at the beginning is towards north or North West in BOB later some of them recurve towards the northeast during their life period. The average life of TC over BOB is 4.7 days with a normal speed of about 12.9 km/hr. Felton (2013) the ENSO influences the SST anomalies which reflect increased cyclogenesis activities in the BOB. Mehl (1987) observed the impact of the SST anomalies on the Indian monsoon. The interdecadal fluctuations of bob are primarily caused by the SST anomalies which are increasing in recent decades. The east equatorial Indian Ocean has a significant contribution of about 37% of cyclogenesis during October -November.
PYQGIS application a blend of PYTHON and QGIS is used for better understanding the spatial patterns and relationships of TCs. It is revealed that the cyclone track properties such as the cyclogenesis point, landfall point, track length, track duration and track sinuosity vary over time. In this study the decadal variations in the cyclogenesis point, trajectories pattern, frequency and landfall point of all categories of TC over BOB during the last 6 decades (1960 -2016) are analyzed and presented. The Bob comprises of 3 main regions namely south Bay, the Central bay and the North Bay whereas these 3 regions are further divided into South Eastern Bay (4°-14° N to 80° -93° E), the South western Bay(4° -14° N to78°-88° E), the North Eastern Bay(18°-22° N to 90°-94° E),the North Western Bay(18°-22° N to 84°-90° E), the West Central Bay(14°-18° N to 86° -95° E), the East Central Bay(14°-18° N to 86° -95° E). The figure 2 illustrates the large amount of variability in TC genesis frequency, spatial distribution and track sinuosity. The figure 2 shows that the TC's during the decade (1960 -1969) mainly occurred in the Andaman Sea and South East Bay of BOB. The North Bay and South East Bay becomes the major TC breeding ground during the decade (1970 -1979). TCs during the decade (1980 -1989) have been noted occurring mainly in the south west and south east bay regions. TC genesis frequency is higher during the decades (1960 -1969) and (1970 -1979) when compared to the remaining three decades. Unlike the post monsoon season the formation of TC during the pre-monsoon season is insignificant due to unfavourable conditions that prevails. In context of BoB the most TC evolutions occurred in the south east bay which is the active region. However there is a fluctuation in the cyclogenesis location on the decadal timescale due to SST variability. The recent decade (2010 -2016) observation shows a shift of cyclogenesis location towards the eastern region of BOB.

(ii) ENSO Modulated events
El Niño-Southern Oscillation (ENSO) is an irregularly periodic variation in winds and sea surface temperatures over the tropical eastern Pacific Ocean, affecting the climate of much of the tropics and subtropics. ENSO the single climatic phenomenon is coupled with three phases namely El Nino -the warming phase, La Nina -the cooling phase and the neutral phase. ENSO forces modulations in environmental conditions over the Bob. The oceanic and atmospheric conditions are vital for the formation, strengthening and the translation of TC's over the BOB.
In BOB El Niño years are generally associated with the increased rainfall activities during the monsoon period due to increase in precipitation, while during the La Niña years the suppression of rainfall activities is observed due to the reduced precipitations. The enhanced precipitation influences the cyclogenesis that aids the development and strengthening of the TC. Felton (2013) during El Nino years the SST anomalies over the BOB are positively correlated during the monsoon period while the reverse of the case was observed during the La Niña years. As the tracks are highly variable over time, the coastal states across BoB experience inter annual variations in cyclone tracks. It is observed that cyclones the were found to move in a more westerly direction and making a landfall over Tamil Nadu coast and south Andhra coast during the El Niño years. The oceanic nino index are the commonly used indices to monitor ENSO which is based on three month average of SST anomalies in the Nino 3.4 region(5 o N-5 o S, 120 o -170 o W)and is available http://www.cpc.ncep.noaa.gov/…/an…/ensostuff/ensoyears.shtml. The comparison of the strong El Niño and La Niña events over AS and BOB is illustrated in the figure. El Niño events commonly occur for a period of 9 to 12 months however some elongated events may last for years. The comparison of El Niño and La Niña events over AS and BOB during the period (1970 -2016) are included in our visualization which clearly depicts the time based information. Our visualization system adopts two major tasks: El Niño events visualization and La Niña events visualization. The TC tracks during strong El Niño phenomena (1972,(1987)(1988)(1991)(1992) and very strong El Niño phenomena (1982-1983, 1997 -1998, and 2015 -2016) are drawn using red polylines. The blue polylines are drawn to represent the TC tracks during the strong La Niña phenomena (1973-1974, 1975-1976, 1998-1999, 2000, 2007-2008, 2010-2011). Tracks of cyclones during the period 1970 -2016 are interrogated further analysis reveals that the frequency and longevity of TCs are higher during La Niña years and lower during the El Niño years. Figure 3 shows the enhanced convection and cyclogenesis activity in the BoB as the impact of the La Niña events.

Parameter visualization using Parallel Coordinate Plot
Parallel coordinate plot (PCP) a well-known multi-dimensional data visualization technique is used to explore many aspects of the high dimensional parameter space in tropical cyclone track modeling. The visualization of PCP is developed with the use of bokeh a python package. To enable the visualization of the atmospheric convective parameters such as intensity, pressure, latitude, longitude, category of tropical cyclone PCP is used. These features are mapped to the axes of PCP to provide a single view of comparative data visualization and analysis. The values of the axes are normalized to understand the correlation patterns of the parameters. To discover the meaningful pattern and to visualize clusters a coloring scheme is followed. We adopt blue, carmine, pink and green colors respectively to represent different categories of tropical cyclones from low to high and the line segments use these consistent colors to link with attribute category. In this Figure 4 four different clusters are apparent displaying the geometric projection of TC tracks. The PCP view shows the display of 5-dimensional data of 141 TC tracks over the period 2000 -2016. Figure 5 displays the correlation heat map plotted using seaborn for easier interpretation.

Results and Discussion
To demonstrate the accuracy of our proposed model, we implemented some experiments to classify the TC tracks. The FOFCM algorithm is implemented with python 3 using tensorflow, keras and pytorch. The shapes of tropical cyclone tracks in the BoB basin of NIO are investigated over the period of 1960 -2016. Terry et al., (2011) establishes a new metric of sinuosity Index (SI) and categorizes the cyclone track in to four categorizes straight, quasi-straight, curving and sinuous. The SI helps in the analysis of the spatial and temporal behaviour of the TC tracks which is influenced by different parameters and convection schemes. In real time scenario cyclone exhibits not perfectly straight paths however tracks can be straight, meandering, recursive, sinuous in shape. The SI is computed as a ratio of total track length and the Euclidean distance between the start and end coordinates of the TC, whereas the total track length is the total transition distance. The SI values are measured for each TC and the tracks are classified into categories straight, quasi-straight, curving and sinuous using FOFCM algorithm. Table 2 shows the Sinuosity distribution of cyclone tracks over the period 2000 -2016. A 10-fold cross-validation was used to define the training set. The results for the testing set, which was not used in the model building, are shown in the Table 3.  The Figure 7 illustrates the quantification of the morphometric parameters according to SI index with the help of GIS and short spanned TCs are neglected. In Fig. 7(d) green track shows the cyclone Madi moving towards north exhibits sudden weakening and took a abrupt reversal close to 180 towards south and made landfall near vedaranyam of Tamil Nadu. The sinuosity index distribution is drawn in figure 8 for 141 TC tracks during the period 2000 -2016.

Conclusion
This study aims to analyze and visualize the tracks of TC that are formed in the Bay of Bengal over the period of 1960 -2016 using GIS data. The visual display of the multidimensional spatial data are produced using the matplotlib, pandas, seaborn frameworks of the Python eco-system We expect that these visualizations will contribute to the forecasters and meteorologist in skillful track prediction. We conclude that in terms of inter decadal variations a decreasing trend of decadal frequencies is observed from 1980 -1989.It is observed that there is a longer life and high intensification of TCs during La Niña years in BoB. Further a significant change in the TCs cyclogenesis location, tracks and landfalls has been observed during the El Niño and La Niña phases. In the intuitive visualization of TC tracks the spatial characteristics are well captured within the GIS environment. The tracks are mapped to the sinuosity categories straight, quasi-straight, curving, sinuous. The sinuosity parameters are computed using PYQGIS software and the tracks are plotted for visualization of track morphometry characteristics. Further we applied FCM clustering to separate TC tracks into four clusters on the basis of the TC shape. A higher proportion of quasi-straight, curving than straight and sinuous tracks. To facilitate the exploration of the multivariate data and to reveal the data trends the parallel coordinates visualization technique is used. Overall, this study is an attempt to visually analyze the TC patterns and the environmental factors that influence their path.

Not applicable Author's Contribution
CRR and DHM analyzed and interpreted the cyclone track data. NV performed the analysis and visualization of cyclone tracks was a major contributor in writing the manuscript. All authors read and approved the final manuscript.

Availability of data and material
The historical tropical cyclone data was obtained from IBTrACS an official archiving and distribution resource for tropical cyclone best track data.

Code availability
Python libraries are used for analysis and visualization of cyclone data and the codings are available.

Ethics approval
Not applicable

Consent to participate
We also consent to participate in manuscript publication. We understand that our contribution will be confidential and that there will be no personal identification in the data that we agree to allow to be used in the study. We understand that there are no potential risks or burdens associated with this study.

Consent for publication
The manuscript does not contain any individual person's data in any form. So consent for publication is Not applicable.