Potential Role of a Three-Gene Signature for Predicting Diagnosis in Patients With Myocardial Infarction


 ObjectiveIncreasing evidence underscored that the expression of genes was associated with the development and progression of myocardial infarction (MI). In this study, We evaluated the diagnostic value of the feature genes in MI based on data from the Gene Expression Omnibus (GEO) database.MethodsWe used the data from the GEO database (GSE66360) to identify a set of significant differentially expressed genes (DEGs) between MI and healthy control. Univariable logistic regression, the least absolute shrinkage and selection operator (LASSO), SignalP 3.0 server and multivariable logistic regression were used to find the potential role of genes for predicting diagnosis in patients with MI. Receiver operating characteristic (ROC) curve analyses, area under the curve (AUC) and C-index were used to estimate the diagnostic value of genes in patients with MI. The validation for the association was conduct in another six independent data sets (GSE141512, GSE24519, GSE34198, GSE48060, GSE60993, and GSE109048). Then, a meta-analysis was performed to evaluate the diagnostic value of genes in MI.ResultsA total of 44 DEGs were selected from GSE66360. Functional enrichment and KEGG analysis were performed to reveal the DEGs in some inflammation-related biological processes and pathways. A three-gene signature consisted of CCL20, IL1R2 and ITLN1, which could effectively distinguish patients in MI (AUC and C-index were the same value of 0.975). The three-gene signature was effectively validated in 7 independent cohorts, and diagnostic meta-analysis results of the three-gene signature showed that the pooled sensitivity, specificity and ROC curve AUC for MI were0.82 (95% CI: 0.68-0.90), 0.91 (95% CI: 0.81-0.96) and 0.94(95%CI, 0.91-0.96), respectively. ConclusionIt was magnificently suggest that the three-gene signature might potentially serve as novel candidate biomarkers for distinguishing MI from healthy control. Besides, more well-designed cohort studies need to be implemented to warrant the diagnostic value of three-gene signature in clinical purpose.


Introduction
Myocardial infarction (MI), also known as a heart attack, is one of the leading causes of hospital admission and mortality worldwide (WANG and JING 2018). Early prevention, screening, monitoring, diagnosis and treatment may reduce the incidence and mortality of MI. However, the recent of research advances in effective treatment for MI was still lacking, so the best strategies for a more important method focused on early diagnosis aiming at managing the underlying etiologies and complications of MI. Although cardiac troponin T (cTnT) and creatine kinase MB (CK-MB) for MI are useful of diagnostic tools, there a relatively low diagnostic accuracy limit their applications (DE WINTER et al. 1995;LIU et al. 2018;ZHAO et al. 2019). Previous studies also showed that a relatively low level of cTnT was challenging to detect in healthy human serum (CHRISTENSON et al. 2000;CHAN and NG 2010). The concentration of CK-MB in the blood decreased gradually after the onset of acute MI 36-72 h, which was almost equivalent normal levels (CHRISTENSON et al. 2000;RAKOWSKI et al. 2014). Furthermore, molecular markers are critical for the research and clinical treatment of cardiovascular diseases (PARK et al. 2015;CHEN et al. 2019;GOBBI et al. 2019). Therefore, identifying the promising novel molecular markers is critical demanded, which will contribute to enhance our understanding of MI initiation and progression and promote early detection of MI.
At present, The National Center for Biotechnology Information developed the Gene Expression Omnibus (GEO) database, which was a consolidation of available transcriptomic data for further expanding the scope of biomedical research. With the quick development of gene microarray technology, it provided an e cient alternative for screening genetic alterations at the genome level, which was bene cial for us to con rm the differentially expressed genes (DEGs) and functional pathways involved the progression of MI. However, it was challenging to identify reliable results that were conducted by independent microarray analysis. Many studies reported that the novel molecular markers were to identify for predicting diagnosis in patients with MI and underlying the mechanisms of MI by using microarray analysis (ZDENEK VALENTA 2012;PARK et al. 2015;MUSE et al. 2017;CHEN et al. 2019;GOBBI et al. 2019).
Therefore, in this current study, DEGs between patients with MI and health control were identi ed, following by univariable logistic regression, the LASSO, SignalP 3.0 server and multivariable logistic regression. By using ROC curve analyses, area under the curve (AUC) and C-index, a robust MI diagnosisrelated gene signature was used to estimate the diagnostic value of genes in patients with MI.
Subsequently, the diagnosis-related gene signature was validated in 7 independent validation data sets. Furthermore, the accuracy of diagnosis-related gene signature was further explored for its accuracy to discriminate MI from healthy control by meta-analysis in all data sets.

Data mining based on the GEO database
Initially, microarray data were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) up to December 2019. The following search word was used: myocardial infarction. Microarray data were considered eligible if they were case-control that reported the gene expression pro ling between patients with MI and healthy control. Exclusion criteria were as follows: (1) duplicate microarray data, (2) lack of case-control, (3) non-human data, (4) the sample of data less than 12 (OBUCHOWSKI and MCCLISH 1997). According to the inclusion criteria, seven GEO datasets were identi ed and included (see Table 1). Figure 1 described a ow diagram of the GEO datasets selection for this study. For those available datasets, the normalised data of gene expression pro ling were downloaded from the database of GEO. Initially, the DEGs between MI and control, we used edgeR package in R statistical software with the threshold of false rate (FDR) < 0.05 and |log fold change (logFC)|>2. Then, those DEGs with a statistical signi cance in univariable logistic regression were selected into the least absolute shrinkage and selection operator (LASSO) to obtain rst-rank diagnostic genes from the patients with MI. Afterwards, to be clinically detectable serum biomarkers from patients with MI in the future, the optimal diagnostic genes were investigated to identify in the SignalP 3.0 server (http://www.cbs.dtu.dk/services/SignalP-3.0/) (BENDTSEN et al. 2004;EMANUELSSON et al. 2007). Lastly, multivariable logistic regression was utilized to build a diagnosis-related gene signature by incorporating the detectable feature selected from the peripheral blood of patients in the SignalP 3.0 server. Receiver operating characteristic (ROC) curve analyses and area under the curve (AUC) was used to estimate the diagnostic value of a diagnosisrelated gene signature in patients with MI and control. Moreover, Harrell's C-index was calculated to quantify the discrimination performance of the diagnosis-related gene signature. P-value of < 0.05 was de ned to have statistical signi cance in the analysis.

Functional and pathway enrichment analysis
Among the DEGs, dataset functional analysis was performed using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses using ClusterPro ler and org.Hs.eg.db package (YU et al. 2012). GO terms and KEGG pathways with a p < 0.05 were considered statistically signi cant.
2.4 Validation of a diagnosis-related gene signature Six data sets including GSE141512, GSE24519, GSE34198, GSE48060, GSE60993 and GSE109048 were categorized into the validation sets. To validate whether the candidate genes might have certain important diagnostic value in patients with MI, we also measured ROC, AUC and C-index in the validation sets.

Meta-analysis
The sensitivity and speci city of each dataset included were calculated by the constructed model of a diagnosis-related gene signature. Then, true positives, false negatives, false positives, and true negatives were tabulated and strati ed by the included all datasets in patients with MI and control. Later, a metaanalysis was performed to get pooled the pooled sensitivity, speci city, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), the bivariate summary receiver operator characteristic (SROC) curve, and area under the curve (AUC), which indicated the overall diagnostic value of a diagnosis-related gene signature in distinguishing patients with MI from control. Statistical heterogeneity among the datasets was assessed using by using Cochran's Q statistic and I 2 tests. Values of 25%, 50%, and 75% for the I 2 test were suggestive of the presence of the low, medium, and high signi cant heterogeneity, respectively. Besides, Fagan's nomogram were used to reveal the clinical utility of the diagnosis-related gene signature. Meta-regression analysis was performed to investigate the effects of potential factors on the diagnostic ability of MI. We assessed the publication bias of the included datasets by using Deeks' regression test of funnel plot asymmetry. All statistical analyses were conducted using STATA 14.0 (Stata Corp, College Station, TX, USA).

Identi cation of a diagnosis-related gene signature for MI
A total of 44 DEGs were obtained and identi ed by the genes pro ling data of the discovery group to univariable logistic regression analysis ( Fig. 2A ). Among them, 8 DEGs were selected by the LASSO method for further investigation in the SignalP 3.0 server (Fig. 2B ). On the premise of considering signal peptide probability, we identi ed a total of 3 DEGs, including CCL20, IL1R2 and ITLN1. Then, the 3 DEGs were analysed by multivariable logistic regression, which results showed that CCL20, IL1R2 and ITLN1 remained signi cantly associated with MI ( Fig. 2D ). Next, we used CCL20, IL1R2 and ITLN1 to construct a diagnosis-related gene signature for distinguishing patients with MI from healthy control. Considering the discrimination ability of a diagnosis-related gene signature, ROC analysis was conducted. The results showed that the sensitivity, speci city and AUC were 0.918, 0.980 and 0.975 to suggest a diagnosisrelated gene signature for MI with higher prediction e cacy. What's more, the C-index value of 0.975 for the 3 DEGs in patients with MI also indicated good discrimination.
3.2 Validation of the three-gene signature in six independent cohorts The robustness of the three-gene signature was regarded as a candidate biomarker for predicting diagnosis in patients with MI, while the validation cohort consisted of the remaining GSE141512, GSE24519, GSE34198, GSE48060, GSE60993 and GSE109048 data sets. However, the results of AUC for the validation cohort showed that the three-gene signature was differently predictive power. Four data sets showed good accuracy in predicting MI (AUC = 0.78 in GSE48060, AUC = 0.978 in GSE24519, AUC = 0.882 in GSE60993 and AUC = 0.867 in GSE109048), but the rest of data sets had a weak predictive power (AUC = 0.639 in GSE141512 and AUC = 0.652 in GSE34198). The results of the sensitivity, speci city for the validation cohort were also displayed in Table 2, which also indicated that the ability of the three-gene signature to distinguish MI from control was the same as the manifestation of AUC. Also, the results of C-index for the six data sets were similar to the effects of AUC for them (Table 2).

Functional annotation
Analysis of the three-gene signature by GO categories and KEGG pathways was crucial for our understanding of biological function. In this study, the top enriched GO terms for biological process (BP) were as follows: cellular response to interleukin-1, response to interleukin-1 and negative regulation of interleukin-1 secretion; and for molecular function (MF): RAGE receptor binding, Toll-like receptor binding and carbohydrate-binding (Table 3). Functional enrichment analysis showed that the top 20 KEGG pathways included the chemokine signaling pathway, IL − 17 signaling pathway and TNF signaling pathway (Table 4). Human T-cell leukemia virus 1 infection 0.054 IL1R2

Meta-analysis for diagnosis
A total of 7 data sets were included in the meta-analysis to determine the diagnostic value of the threegene signature. As shown in Fig. 4, the pooled sensitivity and speci city estimates for the three-gene signature were 0.80 (95% CI: 0.66-0.90) and 0.90 (95% CI: 0.80-0.96), respectively. The moderate informational value of the three-gene signature implied a PLR (8.4), but the NLR (0.22) indicated minimal informational value. Figure 4D displays the use of the likelihood ratio scattergram for investigating diagnostic value; when the right lower quadrant was depicted, the three-gene signature was useful for con rming the presence of MI (while positive) but not for its exclusion (while negative). The DOR and area under the ROC curve were 39 (95% CI: 9-159) and 0.93 (95% CI: 0.90-0.95), respectively, which indicated that the three-gene signature has good discriminatory ability for MI. Figure 4C depicts the use of Fagan's nomogram for calculating posttest probabilities; the three-gene signature increased the likelihood of MI from 57-92%, and the risk decreased to 22% when a negative result was con rmed.
Signi cant heterogeneity was observed (81.54% for sensitivity and 58.99% for speci city) among the 7 included data sets. Thus, to identify the source of heterogeneity, we analyzed heterogeneity from the aspect of a threshold effect, publication bias, bivariate box plot, and meta-regression. The Spearman correlation analysis (correlation coe cient =-0.714, p = 0.071) revealed no threshold effect on the threegene signature for distinguishing patients with MI from healthy control. Deeks' funnel plot asymmetry test demonstrated no potential publication bias in our data sets (t = -0.30; p-value = 0.77) (Fig. 5A). The bivariate box plot revealed that the central location included 6 data sets, with one data set as the outlier, suggesting a low degree of indirect heterogeneity (Fig. 5B). Then, meta-regression was performed to analyze patient size, location, source of the tissue, median distribution and platforms. The major sources of heterogeneity for speci city were the source of the source of the tissue and median distribution. However, the potential sources of heterogeneity for sensitivity were not con rmed. The metaregression results are shown in Fig. 6.

Discussion
In the present study, it was found that the higher expression of CCL20, IL1R2 and ITLN1 in patients with MI compared with healthy control were utilized to construct the model, which had an excellent diagnostic performance for patients in the 7 data sets. The additional diagnostic meta-analysis demonstrated that the three-gene signature turned an outstanding performance in predicting the diagnosis of MI patients. In this research, the ROC area of the three-gene signature was 0.93, which indicated that the three-gene signature might be considered as the candidate therapeutic targets for MI patients. Interestingly, CCL20, IL1R2 and ITLN1 possessed the characteristic of the secretory molecule by using the SignalP 3.0 analysis. So the highly expressed of CCL20, IL1R2 and ITLN1 might be measured in the blood and provided to the early diagnostic biomarkers for MI.
Additionally, the trend of increased serum levels of CCL20 in patients with MI was not signi cantly increased compared to healthy control. A tremendous potential reason for its result might be not big enough size of the sample used to conduct statistical signi cance. However, one previous study demonstrated that serum levels of CCL20 were signi cantly higher in patients with ischemic heart disease, which included acute MI, stable angina and unstable angina (SAFA et al. 2016). What's more, the previous study implied that T-cell death-associated gene 8 (TDAG8) negatively regulated the transcription of the chemokine Ccl20, subsequently increasing expression of CCL20 in TDAG8 KO mice and eventually contributing to the survival rate and cardiac function by suppressing CCL20 (Nagasaka et al. 2017). It should be noted that the appearance of CCL20 increased after the activated mitogen-activated protein kinase by the stimulation of IL-17 signalling. When CCL20 bound to the CCR6 receptor, it played an essential role to recruit the chemoattraction of leukocyte and mediate γδT cells to the in ammation locus and thus the aggravation of cardiac function (YAN et al. 2012;CHANG et al. 2018).
IL1R2 is involved in the process of coronary atherosclerosis. For example, Lian et al. reported that IL1R2 was mediated by miR-383-3p to prevent in ammation injury in the in ammatory damage of coronary artery endothelial cells through the inhibition of the activation of an in ammasome signalling pathway (LIAN et al. 2018). IL1R2 has two different forms of the protein including membrane-binding protein and soluble-protein (soluble IL-1 receptor 2), which was signi cantly associated with the left ventricular remodelling in patients with ST-elevation MI (ORREM et al. 2018).
Omentin-1, also referred to ITLN1, was a novel adipokine which was related to the processes of glucose metabolism, in ammation, and atherosclerosis (MENZEL et al. 2016 There are several shortcomings, which should be considered in our study. The main limitation of this study was conducted using data with the small sample size, which came from several published data sets. Our ndings need to validate in other data sets and clinical trials whether CCL20, IL1R2 and ITLN1 may provide on biomarkers for MI. Moreover, the three-gene signature was based on only in silico methods and only a fraction of the human gene was included in the analysis. So the diagnostic genes could remain not to represent all the gene candidates that were potentially associated with MI. Finally, the mechanisms through which the three-gene signature modulate the progression of MI were necessary to make further investigation. Despite these drawbacks, however, this study still provides a potentially powerful diagnostic marker for MI.

Conclusions
In summary, by combining CCL20, IL1R2 and ITLN1, the three-gene signature was signi cantly associated with diagnosis in MI and could provide potential therapeutic targets and novel therapeutic strategies for MI. Flow chart of microarray data sets selection.