Etiological and Seasonal trend in Cardiovascular Disorder displays potential risk contributing towards the development of myocardial infarction

Seasonal variations in human cardiovascular system are known to play a key role in the onset of many maladies. Behavioural and environmental factors act as confounding variables, failing to address such variables makes it challenging to measure the true temporal impact of these diseases. Conversely, numerous clinical investigations suggest that only certain groups of people are more seasonal sensitive and their maladaptation can contribute to a range of sicknesses. Therefore, it is crucial to assess the etiological and seasonal sensitive patterns among cardiovascular diseases (CVD) affecting most of the human population. For this study, it was hypothesized that cardiovascular and related disorders have strong associations with seasonal sensitive physiological changes. Data mining was performed to extract the relevant information on the association between cardiovascular and related diseases. Disease ontology-based semantic similarity network (DSN) analysis was performed to narrow down the consequent CVDs. Furthermore, topological analysis was carried out to predict the seven CVDs in three clusters including myocardial infarction. Further, these seven CVDs were assessed for their seasonal sensitivity and temporal association among themselves using Mann-Kendall and cox-Stuart analyses. Moreover, temporal associations were veried using LOESS and TBATS. While seasonal decomposition was assessed by autocorrelation and fast Fourier transform outcomes. The study provides indirect evidence of severe seasonal cardiovascular comorbidity among the three cardiovascular diseases, including myocardial infarction, atrial brillation, and atherosclerosis, which are all prevalent in the world population. While two other CVDs i.e. hypertension and heart failure were also identied, with minor temporal trends. Hence, all ve diseases could be classied as seasonal cardiovascular comorbid diseases (SCCD). Furthermore, these diseases could be studied for potential common risk factors such as biochemical, genetic, and physiological factors.


Introduction
Myocardial infarction (MI) is most fatal among all CVDs and responsible for increased rate of mortality worldwide. MI is a multifactorial disorder that shares common pathology and symptoms with other CVDs like cardiac arrest, stroke etc. Other factors which contribute to CVD are mostly lifestyle related such as smoking, drinking, and junk food consumption. Myocardial infarction can be prevented by addressing sedentary life style and risk factors such as tobacco use, unhealthy diet, physical inactivity and harmful use of alcohol using population-wide strategies. Other lifestyle related disorders that share large aetiological overlap are diabetes, hypertension and obesity. Similarly, pathophysiological and environmental condition such as hypoxia, high altitude stress, and ischemia etc are also known to contribute to myocardial infarction. Therefore, it would be interesting to study the aetiological overlap which help to evaluate the comorbidities among the CVDs quantitatively and help in streamlining of pathologies/disease that may contribute towards myocardial infarction.
Further, identifying a common pathogenic mechanism of considerable clinical relevance could prove to be important goal in the diagnosis of cardiovascular comorbid diseases e.g. an atrial brillation characterized as abnormal heart rhythm is associated with myocardial infarction raises risk up to two folds especially in case of rapid heartbeat. (Soliman et al., 2014). Atrial brillation is very common during post-myocardial infarction in aged population (Bhatia and Lip 2004). Prevalence of atrial brillation in the demography with heart failure is about 10 to 30% associated with lower physical activity and a poor longterm prognosis (Offutt 2004). Atrial brillation generates a potential risk for myocardial infarct and can be diagnosed by electrocardiogram. Similarly during post myocardial stage there is accelerated growth of plaque in the inner lining of an artery at faster rate (Dutta et al., 2012) the accumulation of plaque causes atherosclerosis. Blood monocyte volume increases during an acute myocardial infarction, and these cells aggregate in the developing cardiac lesion (Nahrendorf et al., 2007, Nahrendorf et al., 2010. As a consequence, the person is subjected to either an acute in ammatory event e.g. myocardial infarction or a pre-existing chronic in ammatory condition e.g. atherosclerosis, both of which include myeloid cells. This excessive growth causes carotid intima-media thickness results into abnormal heart rhythms i.e. atrial brillation (Heeringa et al., 2007, Willeit andKiechl 2014). Hypertensive heart disease encompasses a range of pathological conditions varying from uncontrolled hypertension to heart failure. Clinical studies with hypertension have demonstrated that hypertension management can reduce heart failure onset. Changes in lifestyle can also enhance diastolic function and lower the risk of heart failure with physical workouts and weight loss (Sorrentino 2019). Surprisingly, the development of CVDs during the pathophysiological process of SCCD ares turned out to be the most prevalent pathogenic process.
Myocardial infarction, atrial brillation, atherosclerosis, hypertension, and heart failure, for example, disrupting blood ow functions and morphological processes.
The combination of CVD risk factors along with the comorbidity will lead to the development of an index/scale for classifying CVDs according to their severity. Further these classi cation were validated using seasonal sensitivity using google trends (GT). To ful l this purpose, PubMed was used to collect the CVD associated symptoms and disorders. Disease Ontology (DO) based semantic similarity matrix (DSN) and network were constructed to narrow down the benchmark diseases and disorders of CVD (Patel et al., 2018). The DSN which was further subjected to topological analysis uncovered the highly associated or the most promising cardiovascular comorbid diseases. The predicted seven diseases in two groups were found to be "seasonal cardiovascular comorbid diseases (SCCDs)". A time series analyses on worldwide google trends relative search volume from Jan-2004 to Dec-2020 were conducted to investigate whether these SCCDs have seasonal patterns.

Text mining and disease curation
The combination of keywords as "CVDs" and "cardiovascular diseases" were used to nd out the list of publications from public repository database i.e. Pubmed as on 03/01/2021. Our search criteria was restricted to Human studies and should be published in English language only. The Pubtator, a textmining tool was used to identify all disease/disorder annotation from collected literature text from Pubmed (Wei et al., 2013). Further the anomaly in the nomenclature of diseases were removed to get rid of disease synonyms. Wrong spelled and unidenti ed names were removed during screening process.
Finally comprehensive, non-redundant and anomaly free list of CVDs and associated diseases/pathologies were identi ed.

Disease semantic similarity relationship
The aetiological relationship between these CVDs and associated diseases/pathologies were quanti ed using Disease semantic similarities. So, Disease ontology IDs (DOIDs) for CVDs were collected from European Molecular Biology Laboratory (EMBL). These DOIDs were submitted to Disease Ontology Semantic and Enrichment (DOSE), a library in R statistical program for quantifying these association scores among the diseases using semantic similarity (Yu et al., 2015). A disease ontology semantic similarity matrix was obtained as an outcome. The etiological links between two diseases are expressed by disease ontology scores ranging from 0 to 1. For instant, ontology score among neurological diseases will be higher instead of a score between a respiratory and neurological disease. As a result, we would be able to gure out which CVDs have the robust association.

Construction and analysis of disease association weighted network
Disease semantic similarity matrix provides relationship scores between diseases to construct weighted disease associative networks. To increase the stringency of the network the relationship having DO score < 0.3 were excluded. The weighted disease associative network was visualized in cytoscape 3.6. The cytoscape network analyser was employed for topological analysis of the DO based cardiovascular disease network. These topological features were utilised to perform hierarchical clustering and principal component analysis to arrange these disorders into the appropriate clusters. The 'pheatmap' library was used to create a heatmap with split cell line spacing (Kolde 2019). Dendrograms were plotted using 'pvclust' library in R program which implements bootstrap resampling algorithms to generate probabilities (p-values) for each cluster (Suzuki and Shimodaira 2006). There are different kinds of p-values: essentially unbiased (AU) and bootstrap probability (BP) p-values. The AU p-value is determined using multiscale bootstrap resampling, which has a lower bias also than the BP value produced using conventional bootstrap resampling. Principal Component Analysis (PCA), an approach for decreasing the dimensionality of multivariate data without losing signi cant information was used to summarise the network topological properties. For this purpose, factoextra, a library in R program was employed that extracts and visualises the results of interpretive high-dimensional data studies (Mundt 2020). The overall methodology has been depicted in Fig. 1.

Google Trends RSV analysis
With the exception of a few outliers, various cardiovascular disorders found in scienti c literature were divided into separate categories. Thereafter, Google Trends (GT), a public search tool that preserves public search trend downloadable data termed as relative search volume (RSV) was collected & utilised to determine the seasonal trend of diseases in a particular geographic location. The GT gives search volume data in the range of 0 to 100. The cardiovascular diseases that we had inferred through ontology and topological categorization were submitted to GT search bar. Here we submitted all the disease terms of similar group one by one. We got month wise RSV data for all group diseases from Jan-2004 to Dec-2020. The data was downloaded as .csv (comma separated value) le extension. Firstly, the data was plotted in excel simply as line chart to check the comparative magnitude of RSV. Even though some diseases showed extremely poor magnitude in their GT search volume due to which they were not regarded suitable to include in further analysis. Initially, we deduced 2 comorbid disease groups from DO ontology score based weighted network analysis while the similarities in disease seasonal cycles might reinforced this nding. To validate this, we applied Mann Kendall (MK) test to unravel the signi cant trends in the disease RSV. On the other hand, Cox-Stuart test a method for trend analysis based on the binomial distribution was also performed to cross check the output from MK test using 'trends' package in R (Pohlert 2020) and further veri ed by seasonal decomposition using LOESS and TBATS. Trends certainly contribute a better understanding of trends while cyclic patterns in data are capable to uncover repetitive instances in a given period of time. So, the autocorrelation analysis was performed using 'stats' library in R program to gain insights regarding the cyclic events in the disease RSV datasets (Team 2020).
As the autocorrelation limits to quantify the time window of a periodical event. That's why periodicity analysis was taken into account and 'TSA' library was used to perform Fast Fourier Transform (FFT) i.e. periodicity analysis (Ripley 2020).

Comorbid patterns in cardiovascular diseases
PubMed returned 4519 PMIDs when the keyword 'cardiovascular diseases' was queried in an advanced search option. Our search criteria was limited to Human species and articles in English language only. This PMIDs list was submitted to Pubtator (web-based text mining tool) and obtained an .xml extension le. XML le contains annotated bio-entities for all submitted PMIDs articles separately. We manually curated the data and removed redundant diseases. Finally 852 diseases buried inside the articles were identi ed and annotated them. The diseases with highest frequency greater than 100 were considered suitable for the analysis. At this point, we obtained 27 CVDs terms from literature mining but disease ontology ID (DOID) were not assigned to eight diseases. Disease ontology analysis was performed using DOSE package in R program for remaining 19 CVDs. All DOIDs were submitted into R program through DOSE library functions and a disease ontology matrix with DO scores was retrieved. The ontology score ranges from 0 to 1 among the DOIDs was observed. Higher DO score ensures higher etiological overlapping between two diseases and vice-versa. But lower DO scores were ignored to avoid complex associations among diseases. That's why, DO score of > 0.3 was considered as a threshold to indicate a substantial etiological relationship between cardiovascular diseases. For a better understanding, a simplot with a colour gradient of DO scores may be examined ( Figure S1). A weighted network based on higher DO scores among the cardiovascular disease was constructed in cytoscape and network analysis was performed using network analyser an inbuilt plug-in of cytoscape ( Figure S2) (Shannon et al., 2003). A matrix having various network topological properties following as average shortest path length, betweeness centrality, closeness centrality, clustering coe cient, degree eccentricity, and neighbourhood connectivity, number of directed edges, radiality, stress and topological coe cient was obtained which further was used for clustering analysis. Based on the ndings of heatmap, dendrograms (2D clustering) and principal component analysis (factor analysis), we bifurcated diseases into two primary groups ( Fig. 2a-b, Figure S3). Group 1 diseases: atrial brillation, atherosclerosis and myocardial infarction and Group 2 diseases: hypertension, heart failure, renal disease and cerebrovascular disease might be in a comorbid relationship while heart disease, cardiomyopathy and diabetes identi ed as outliers were not considered suitable for further study. Ultimately, we noticed that myocardial infarction along with atrial brillation and atherosclerosis, being such a centralized disease of the cardiovascular system, would have the most overlaps with other CVDs. The text mining, ontology and clustering analysis revealed comorbidity between two groups of cardiovascular diseases, which were further veri ed through a realtime corroboration by public trend analysis such as Google trends analysis.

Temporal patterns in cardiovascular diseases
GT was used to collect monthly worldwide relative search volume (RSV) data for two major cardiovascular disease groups from January 2004 to December 2020. RSV data was shown as a line chart, and we observed that atrial brillation (blue) in group 1 is associated with the highest range of average RSV (> 60), followed by myocardial infarction (orange), and atherosclerosis (grey). Similarly in group 2, hypertension (blue) has the maximum RSV (> 50), trailed by heart failure (orange), while cerebrovascular disease (yellow) and renal disease (grey) have the lowest RSV (Fig. 3). Although, from 2007 & 2008 onwards, the overall searching frequency for Group 1 and Group 2 diseases respectively has somewhat declined.
Further, Mann-kendall p-values for both disease groups were found to be signi cantly less than 0.05. In the Cox-stuart test, atrial brillation and atherosclerosis have negligible p-values. In the Mann-Kendall test, the Z, tau value suggests negative trends in all diseases for both groups, with the exception of heart failure, which shows a positive trend. The entire trend ndings may be seen in Table 1. The outcomes obtained from trend analysis were veri ed seasonal decomposition analysis ( Figure S4, Figure S5). Seasonal decomposition clearly depicted renal and cerebrovascular as week seasonal patterns. A cyclic pattern in the data is represented by an autocorrelation plot. The cyclic pattern justi es the repetition of events at a xed interval of time. RSV of both disease groups data exhibit a signi cant cyclic repetitions in our autocorrelation study, but atherosclerosis and cerebrovascular disease in group 1 and 2 respectively exhibit a modest cyclic repetitions (Fig. 4). Finally, but certainly not least, we applied the 'TSA' package in R program to detect the periodicity using the Fourier transform approach. The frequency of the time frame repetition in a data set is known as periodicity. With the exception of cerebrovascular disease, we observed 6 months seasonal cycles in all group diseases (Fig. 5).

Discussion
Nowadays, the enormous complexity of data made a challengeable task to nd the genetic intersection of diseases that share common pathologies, symptoms, molecular risk factors, biomarkers, and therapeutics. As myocardial infarction is a multifactorial cardiovascular disorder with aetiological overlaps beyond cardiovascular diseases and hence all these cardio-pathological conditions possess risk of causing myocardial infarction. In terms of comorbidity, most of these conditions may further be spaced out than those that are more closely related to myocardial infarction. So comorbid screening of these diseases would reduce the stochastic data to more simpli ed form. Hence the extent to which comorbidity involves in the progression of cardiovascular diseases, and how these superimpositions of etiological phenomena may well be addressed by disease temporal patterns, is a fascinating area in health & medicine. The associations among cardiovascular and its associated diseases were composed by curation of biomedical terms through published literature using text mining. Additionally associations between diseases were quanti ed using disease semantic similarity scores. Subsequently these associations were collectively visualized using weighted diseases associated network. The network was reduced by increasing stringency in the edge weightage. Finally network study through topological analysis revealed two clusters of seven most comorbid cardiovascular diseases. The myocardial infarction shows high comorbidity with atrial brillation, atherosclerosis and cardiomyopathy, whereas little more with hypertension, diabetes and heart failure. These classi cation were further validated through their temporal pattern analysis using google trend. The temporal patterns of these diseases led us to discovered 5 diseases (Group 1: myocardial infarction, atrial brillation and atherosclerosis; Group 2: hypertension and heart failure) followed the seasonal patterns. The strengths, drawbacks, and implications of each result are described following.
The text mining analysis was followed by ontology and time series analysis and delivered two groups of cardiovascular diseases as an outcome. Both disease groups revealed strong etiological similarities as well as coordinated search patterns in their respective groups. The magnitude of GT RSV among cardiovascular diseases is the only major difference we observed. These diseases were named as seasonal comorbid cardiovascular diseases (SCCD). Primarily, it's worth noting that RSV intensity among group 1 diseases is closer than RSV of group 2 diseases. Secondly, group 1 has a considerably larger RSV also than group 2. As SCCD group 1 was identi ed more robust than SCCD group 2 in various analysis so considered more severe group. So keeping this aspect in mind, atrial brillation, myocardial infarction and atherosclerosis would be focussed more in our discussion part.
Several studies already have successfully reported temporal search behaviour for diseases and certainly would have some clinical relevance. To validate this query, we observed deeply into GT RSV of CVDs and found interesting observations. The majority of studies show 'winter peaks' in CVD-related hospitalizations and death event rates are generally higher in the winter than in the summer , Stewart et al., 2017. Atrial brillation, Myocardial infarction and atherosclerosis are showing an immediate dipping point in summer and a peak during winter in GT RSV plot which is already reported in clinical study (Robinson et al., 1992, Censi et al., 2017, Nagarajan et al., 2017. But reversely, in hypertension is showing two signi cant lower dipping points in august and december respectively while peak in October-November and March. This twice falls and peaks is due to the seasonal nature of hypertension i.e. summer and winter hypertension ). In the last, heart failure is also showing an upward trend from January in each year and downward from May as reported by Massimo Gallerani et.al. (Gallerani et al., 2011).
To our understanding, none of the studies have looked at the development of cardiovascular diseases in patients with SCCD i.e. myocardial infarction, atrial brillation, and atherosclerosis through GT. Remarkably, all three diseases impede normal heart functions by lowering the blood ow, a major symptoms cardiovascular diseases. Therefore, SCCD appears to share the emergence of heart condition as the disease progresses. Additionally, one of the primary variables that affects an individual's vulnerability during seasonal variations by evoking complicated patho-physiological networks may be genetic composition.
Considering incident rate and chances of fatality due to myocardial infarction, early detection is important for reducing risk of occurrence. It would also help in appropriate disease management and, timely medication. Myocardial infarction is a multifactorial CVD due to the presence of one or more risk factors. It has large portfolio of CVDs and associated diseases with aetiological overlaps. Hence to decipher the associations between myocardial infarction and other associative diseases was highly required and challenging to rank these diseases/pathologies with severity sensitivity index. Their comorbidity was validated through seasonal, cyclic and periodicity pattern analysis.

Conclusion
The identi cation of SCCD pattern would help government to design health policies for the management and prevention of annual intensity of CVDs. A number studies already reported the importance and reliability of google trends in the predicting and forecasting of seasonal sensitive outbreaks e.g. In one of the study by Tkachenko et al. in 2017 Google Trends was able predict clear warning of diabetes by GT analysis (Tkachenko et al., 2017). Similarly investigation regarding Zika and Chikungunya surveillance in Venezuela was conducted using GT and a positive correlation was found between search trends and actual data released by health o cials (Strauss et al., 2020). ARIMA model was performed on GT data to forecast zika virus breakout and found promising results (Teng et al., 2017). Recently the novel coronavirus (covid-19) trends were also being analysed by google search volume with favourable results leading respective government to formulate the policies in same direction (Sahanic et al., 2020, Venkatesh and Gandhi 2020, Rovetta 2021). Further research in this area might aid healthcare professionals in developing season-based methods to reduce the burden of CVDs, as well as the effectiveness of medication that begins at different seasons of the year to reduce the seasonal outburst. Declarations Funding Table 1   Table 1 is available in the Supplementary Files section. Figure 1 Flowchart for data collection, network/matrix construction, and analysis using tools, output, and criteria for selection. Arrows signs are indicating the ow of work design.    Autocorrelation of comorbid CVDs for worldwide monthly RSV from Jan-2004 to Dec-2020 is showing a repetitive pattern above the threshold dotted line.

Figure 5
The spectral density was plotted obtained by periodicity analysis carried out using fast fourier transform from worldwide GT RSV for CVDs.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.