Identification of issue inhibitor of metalloproteinase 1 (TIMP1) as a potential biomarker 1 for the diagnosis, pathogenesis and prognosis of colorectal cancer via integrated 2 bioinformatic analysis

17 Background: Colorectal cancer (CRC) is a common malignant tumor of the digestive system. 18 It is crucial to screen potential biomarkers for the diagnosis, pathogenesis, and prognosis of 19 CRC because there are limited clinical symptoms associated with this cancer. Therefore, we 20 attempted to identify biomarkers associated with the occurrence and progression of CRC by 21 utilizing bioinformatic analysis and to elucidate a molecular mechanism for the diagnosis and 22 treatment of CRC. 23 Methods: Two independent gene expression profile datasets of colonic neoplasms 24 (GSE44076 and GSE37182) were collected from public GEO datasets, which included 182 25 tumor tissues and 236 normal tissues. Next, differentially expressed genes (DEGs) between 26 CRC colonic samples and non-CRC colonic samples were obtained via GEO2R online tools. 27 Subsequently, hub genes were selected by several analyses of DEGs, including GO pathway 28 enrichment analysis, KEGG pathway enrichment analysis, and PPI network analysis. Finally, 29 the correlation between the hub genes and the occurrence of CRC was tested by harnessing 30 survival analysis and ROC curve analysis. 31 Results: Sixty-one shared DEGs were screened, including 44 high-expression genes and 17 32 low-expression genes, in CRC samples. Four genes (MYC, TIMP1, MMP7, and COL1A1) 33 were considered to be hub genes because they exhibited higher connectivity degree scores 34 through PPI network analysis. More importantly, there was a significant correlation between 35 increased expression of TIMP1 and reduced survival time in patients with colorectal cancer. 36 Conclusion: By using bioinformatic analysis, this study suggested that Timp-1 may represent 37 a potential biomarker for the diagnosis and prognosis of targeted molecular therapy for CRC.

tumors worldwide. CRC was ranked third in global morbidity and second in global mortality 45 [1] . By 2020, approximately 147,950 people in the U.S. have been diagnosed with CRC, and 46 53,200 people have died because of CRC [2] . The incidence rate of CRC has recently 47 increased in Asia and is highly correlated with such lifestyle factors as obesity, lack of 48 exercise, drinking and smoking, and poor diet [1] . At present, the clinical screening and 49 diagnosis of colorectal cancer primarily rely on colonoscopy, stool immunochemical testing, 50 and stool occult blood testing. The conventional therapy for colorectal cancer is surgical 51 treatment combined with chemotherapy and radiation therapy. However, 28% of patients with 52 colorectal cancer still have unfavorable prognoses [3,4] . Also, the hospitalization of CRC 53 patients places considerable economic burdens on families and society. Therefore, to 54 effectively diagnose CRC, reduce its mortality, and improve its survival time, it is urgently 55 important to identify new biomarkers and further explore the pathogenesis of this cancer. 56 Microarrays have been widely utilized to identify new potential biomarkers and perform 57 molecular diagnosis of cancers [5] , while bioinformatic analysis contributes to the 58 identification of more precise biomarkers by collecting and integrating data from multiple 59 projects [6] . Because these two types of technologies are able to integrate and analyze vast 60 quantities of data, it is helpful for researchers to determine the differential expression between 61 genes, identify hub genes comprehensively and discover new clues for the pathogenesis and 62 diagnosis of colorectal cancer [7] . The purpose of this study was to identify potential 63 biomarkers related to the diagnosis, pathogenesis, and prognosis of colonic neoplasms by 64 integrating DEG analysis, GO  low-expression genes) were recognized from GSE44076, and 140 DEGs (106 high-expression 129 and 34 low-expression) were extracted from GSE37182 using the GEO2R online tool 130 according to genes with |logFC (fold change)| > 2 and adjusted P-values < 0.01. The 131 distribution of DEGs was depicted by heatmap ( Figure 1). 132 Next, the shared overexpressed and downregulated DEGs between these two datasets were 133 illustrated by Venn diagrams. Sixty-one shared DEGs were identified, including 44 134 overexpressed items and 17 downregulated items in CRC samples ( Figure 2 and Table 2). indicates genes with low expression, while the red color indicates genes with high expression. 138 Each row represents one gene probe, and each column represents one sample.  Table.2 Shared DEGs between selected datasets in this study 147

Shared DEGs
Gene Symbol  (44 items)   COL1A1, COL1A2, TIMP1, REG1A, GDF15, TACSTD2, PLAU,  CDC25B, AXIN2, ETV4, TRIB3, TESC, SOX9, MMP11, PHLDA1,  NFE2L3, FOXQ1, MMP3, UBE2C, MMP12, CLDN2, TOP1MT,  DACH1, CLDN1, MTHFD1L, ARID3A, TRIP13, TGFBI, NKD2,  SERPINB5, SLC7A5, CDH3, ASCL2, PSAT1, LY6G6D, CTHRC1 GO terms were selected on the basis of having P-values < 0.05. Among the GO terms, there 153 were 14 terms for biological process, which were mostly associated with one-carbon 154 metabolism, the negative regulation of the canonical Wnt signaling pathway and collagen 155 fibril organization; 7 items for cellular components, primarily in connection with the 156 extracellular matrix and the extracellular exosome together with the lateral plasma membrane; 157 and 3 items for molecular function, highly involved with guanylate cyclase activator activity, 158 carbonate dehydratase activity, and metalloendopeptidase activity. 159 In the KEGG pathway enrichment analysis, 5 pathways were indicated based on having To determine the relevance between the shared regulated DEGs in CRC, we analyzed the 172 protein-protein interaction PPI network (Figure 4) normal samples from the TCGA database. As shown in Figure 5, the results showed that 191 overexpression of TIMP1 (HR = 1.7, P = 0.036) was significantly correlated with unfavorable 192 overall survival for CRC patients. However, upregulation of COL1A1 (HR = 1.6, P = 0.075), 193 MMP7 (HR = 1.1, P = 0.064), and MYC (HR = 1.1, P = 0.82) did not significantly affect the 194 overall survival rate in colon cancer patients. From this aspect, we observed the importance of 195 TIMP1 in CRC patients, which can be considered a biomarker to predict the survival time or 196 survival rate of colonic neoplasm patients.

ROC analysis of TIMP1 in CRC 203
To determine whether TIMP1 is valuable for diagnosing CRC, we used the GSE44076 and 204 GSE37182 datasets to perform ROC analysis. There were two gene probes (11715359_a_at 205 and 11715360_x_at) relating to TIMP1 in GSE44076 and one gene probe (ILMN_1711566) 206 for TIMP1 in GSE37182. The AUC of 11715359_a_at was 94.44%, the AUC of 207 11715360_x_at was 96.54% ( Figure 6) and the AUC of ILMN_1711566 was 99.11%, all of 208 which were above 80%. Hence, we believe that TIMP1 could be regarded as a potential 209 biomarker to diagnose or treat CRC in clinical research. Colorectal cancer (CRC) is a common type of cancer in the gastrointestinal tract that also 215 ranks third in global morbidity and second in global mortality [1] . In recent years, even though 216 considerable progress has been made in the screening of colorectal cancer, no significant 217 breakthroughs have been made in the early detection of tumors. We believe that the mortality 218 rate of CRC would be significantly reduced if most patients with this cancer were diagnosed 219 in the early period of the disease. Hence, it is pivotal to screen for potential tumor biomarkers 220 and to improve the tumor detection rate.

222
There was a significant cluster of high-expression and low-expression genes in CRC samples, 223 as depicted in heatmaps, which indicates differences in gene expression between tumor 224 samples and normal samples. We identified 61 shared DEGs between CRC samples and 225 normal samples from two independent gene profiles in the GEO database, including 44 highly 226 expressed genes and 17 genes with low expression.

228
The functions of DEGs were investigated through GO enrichment analysis, including 229 biological process (BP), cellular component (CC), and molecular function (MF). One-carbon 230 metabolism, negative regulation of the canonical Wnt signaling pathway, and collagen fibril 231 organization were the three main processes obtained in BP. The Wnt signaling pathway can 232 regulate cell movement, and its function is inhibited in various cancers [15] , such as breast 233 cancer, thyroid cancer, and colon cancer [16] . The Wnt signaling pathway not only can regulate 234 the tumor microenvironment (TME) but also can be an important target for inhibiting tumor 235 growth [17] . The basement membrane is located between endothelial cells and epithelial cells 236 and is composed of collagen and noncollagen. Collagen type IV is generally present in all 237 kinds of basement membranes, and the lack of the collagen type IV α-chain is significantly 238 related to tumor invasion, which is a sign of malignant tumors [18] . 239 240 The extracellular matrix, extracellular exosomes, and lateral plasma membrane are three main 241 components of CC. The extracellular matrix (ECM) is the framework that constitutes organs 242 and tissues. Carcinoma-associated fibroblasts (CAFs) can achieve tumor cell multiplication 243 and migration by degrading the structure of the ECM [8] . Therefore, remodeling of the 244 structure of the ECM in tumor tissues can be utilized as a potential diagnostic and therapeutic 245 target [9,10] . 246 247 Guanylate cyclase activator activity, carbonate dehydratase activity, and metalloendopeptidase 248 activity are highly involved in MF. Guanylin is the most commonly described downregulated 249 gene product in sporadic CRC [11] , and guanylate cyclase 2C (GUCY2C), a tumor suppressor, 250 is a transmembrane receptor expressed in the lumen of the intestinal epithelium. Activated 251 GUCY2C catalyzes the composition of cyclic guanosine monophosphate (cGMP) and 252 induced cascades of intestinal epithelial homeostasis [12] , which means that the lack of 253 GUCY2C and cGMP could cause intestinal transport dysfunction and tumorigenesis [13] . The 254 development of tumors is related to their microenvironments. The rapid growth of tumors 255 increases the pressure on vascular function and causes an insufficient supply of oxygen the 256 tumor area. Carbonic anhydrase IX (CAIX) (a type of carbonate dehydratase protein) is 257 highly sensitive to cellular hypoxia. After CAIX is activated, it can promote cell migration 258 and increase tumor cell infiltration. Various types of metalloendopeptidases have high 259 expression levels in tumor tissues and play essential roles in the distribution and metastasis of 260 tumor cells [14] .

262
Furthermore, through PPI network analysis of shared regulated DEGs, four genes with high 263 connectivity degree scores were considered to be hub genes, namely, MYC, TIMP1, MMP7, 264 and COL1A1. Subsequently, by constructing a survival rate curve, we observed that 265 upregulated TIMP1 has a significant correlation with the poor survival of patients with 266 colorectal cancer, and TIMP1 had a high predictive ability to diagnose CRC through ROC 267 curve analysis. Thus, we believe that TIMP1 could be a potential biomarker for diagnosing 268 and treating early-stage colon cancer in clinical research.

270
There are four members in the tissue inhibitor of metalloproteinase (TIMPS) family, namely, 271 TIMP1, TIMP2, TIMP3, and TIMP4. Among these genes, TIMP-1 is located on chromosome 272 Xp11.3-p11.23 and is found in the plasma and intercellular matrix [19,20] . It has been reported 273 that compared with the expression level of TIPM-1 in healthy people, the expression of this 274 gene is significantly increased in the plasma of patients with primary rectal cancer and colon 275 cancer [20,21] . Moreover, TIMP-1 not only has a high expression level in colorectal cancer [23][24][25]

276
but also notably high expression in lung cancer [24] , breast cancer [26,27] , prostate cancer [28] and 277 some other types of tumors, which indicates that TIMP-1 could be a biomarker for patients 278 with early-stage tumors. The expression level of TIMP-1 is related to tumor TNM stage, 279 survival rate, distant metastasis rate, and recurrence rate. Compared with other tumor markers 280 (MMP-9, CEA, and CA199), TIMP-1 has higher diagnostic sensitivity, and this diagnostic 281 sensitivity could be improved if the gene were employed in combination with MMP-9 or CEA 282 [20,29] . 283 284 TIMP-1 can increase the risk of tumorigenesis by promoting excessive cell proliferation and 285 chromosomal abnormalities. The activity of tissue gelatin and the stability of collagen fibrils 286 in the tumor matrix are also affected by TIMP-1 [30] . Furthermore, this study observed that 287 carcinoma-associated fibroblasts (CAFs) have an important function in the process of TIMP-1 288 overexpression and carcinogenesis. Overexpressed TIMP-1 promotes the accumulation of 289 CAFs and promotes the migration and growth of tumor cells. In contrast, TIMP-1 inhibitors 290 could achieve anticancer effects by blocking the function of CAFs [31] .

292
The upregulated TIMP-1 in humans and cells may have different functions. Insufficiency of 293 TIMP-1 could limit the growth and metastasis of CRC in vivo [20,32] , while high expression of 294 TIMP-1 stimulates cell apoptosis in vitro. TIMP-1 interacts with CD63 and integrin β1 295 (ITGβ1) on the cytomembrane to induce apoptosis in human breast epithelial MCF10A cells 296 [33,34] . 297 298 Glycosylated TIMP-1 has dual-directional regulation in cancer progression and metastasis, 299 which may not only inhibit tumor progression [35] but also stimulate tumor cell growth and 300 deterioration [36] . In the early stages of cancer, the expression level of glycosylated TIMP-1 is 301 proportional to the tumor growth rate, while in the late-stage malignant tumors, the 302 upregulation of glycosylated TIMP-1 delays the growth of tumor tissues [37] . These findings 303 indicate that the degree of glycosylated TIMP-1 can also be used as a biomarker for judging 304 tumor staging and prognosis. 305 306 Conclusion 307 In this study, employing bioinformatic analysis, we found that a hub gene (TIMP-1) correlates 308 with unfavorable prognosis in patients with colorectal cancer. However, our current research 309 analysis is only based on data analysis, and we need to perform more clinical trials and basic 310 research regarding TIMP-1's influence on CRC to determine the potential mechanism of 311 TIMP-1 in the early diagnosis, pathogenesis, and prognosis of colorectal cancer. Availability of data and materials 341 The data sets during and/or analyzed during the current study are available from the 342 corresponding author on reasonable request. 343 344 Ethics approval and consent to participate 345 Ethics review was not necessary for this study because we examined published data from 346 public GEO datasets. It is bioinformatic analysis no consent was needed from patients.

348
Consent for publication 349 Not applicable.

351
Competing interests 352 The authors declare that they have no conflicts of interest. 353 354