Systematic Proteomics Analysis of Colorectal Cancer: to Recognize Potential Biomarkers and Remedial Target

Colorectal Cancer (CRC) is a high prevalence malignant of the digestive system. Notwithstanding huge investigation, the precise involved mechanisms are still blunted. It emerges that more research is needed to understand the underlying mechanisms of CRC better. In this investigation, a systems biology method was applied to provide a comprehensive insight through making a multilayer network to recognize novel biomarkers and potential drug targets. To identify the affected proteins in this disease, an existing protein expression prole on CRC was attained from ProteomeXchange. In the rst step, the quality of the dataset was assessed, and then differentially expressed proteins (DEPs) were recognized based on FDR<0.05 as the statistically signicant threshold. Subsequently, all identied DEPs were subjected to further enrichment analysis for nding the involved biomedical phenomena. In the following action, lncRNAs and miRNAs related to DEPs were predicted as regulatory layers via Cytoscape to make an integrative network. The empirical ndings in this study indicated that the identied DEPs were mainly associated two biomedical terms, such matrix Additionally, several were and Can be as a potential further research should concentrate on the investigation their CRC deeper the vaguely involved


Introduction
Colorectal cancer (CRC) is a well-known non-communicable disease located in the third place of cancer death. In spite of massive investigations on that, its incidence rate is increasing even in developed countries, which could stem from our narrowing knowledge about the exact underlying pathogenesis mechanism of the disorder (1,2). Considering the complicated nature of cancer, focusing on a single molecule or even one biomedical phenomenon is not applicable to decipher exact CRC pathogenesis (3) (4). Owing to the complexity, our understanding is not entirely de ned. So that, introducing novel remedial target or early detection biomarkers are potential manners that can shed light on the CRC pathogenicity (5) (6). Systems biology is a promising approach that can provide a holistic insight to introduce potential remedial strategies or non-invasive biomarkers to detect this kind of complicated disorder early (7). Nowadays, this technique, alongside the high-throughput data become extensively carried out to recognized novel clues in the disease (8)(9)(10). However, far too little attention has been paid to make a holistic insight utilizing the construction of a multilayer network focusing on regulatory layers in the colorectal cancer disease.
Therefore, the object of this study is to re-analyze a proteomics dataset that is coming from comparing the protein expression pro le of the tumor and non-tumor tissue to make a multilayer network including differentially expressed proteins and their related regulators such as MicroRNAs and lncRNAs. In order to make a comprehensive view through nding the molecular signature, introducing novel drug target and biomarker alongside the recognition of the pivotal involvement pathways in the pathogenesis of CRC.

Method And Materials
Data acquisition A protein expression pro le (PXD019504) related to colorectal cancer patients produced by Tanaka et al. (11) based on a label-free approach was achieved from the ProteomeXchange database (http://proteomecentral.proteomexchange.org) (12). This dataset was including nine tumor tissues from colorectal cancer and nine non-tumor tissues.

Protein Identi cation process
To translate the peptide mass spectrum to identi ed proteins MaxQuant software integrated with Andromeda search engine was used with following parameters: methionine oxidation, N-terminal acetylation, and cysteine carbamidomethyl were chosen as variable and xed modi cation, for the following parameter two minimum ratio count was set to quantify the identi ed proteins, Homo sapiens proteome version of was downloaded as a theoretical spectrum, PSM and protein FDR <0.05 was considered as the signi cant threshold for peptide and protein identi cation (13,14).

Differentially expressed proteins detection
To identify differentially expressed proteins among whole identi ed proteins, Perseus application was applied; after the ltering of the dataset from contaminant and reverse proteins, quality control assessment was carried out employing principal component analysis (PCA), for the next step, three valid value ltering was carried out based on existing groups of the dataset. To nd DEPs, a T-test two sample was performed as a statistical method, and permutation-based FDR<0.05 with 250 randomizations was considered a signi cant threshold (15).

GO and Pathway enrichment analysis related DEPs
To make a conceptual insight into DEPs based on their altered biological function, GO and pathway enrichment analyses were performed through a well-known user-friendly software, Cytoscape (16).
ClueGO plugin Is a major bioinformatics tool that is used in this study to annotate proteins based on their biological process, molecular function, and cellular component. REVIGO was carried out to condense achieved ndings in the form of parent terms. The Reactome database was utilized to understand related biomedical phenomena based on DEPs. In these procedures, adj. p-value<0.05 was chosen as signi cant cut-off criteria (17) (18).

Construction of regulatory multilayer network
In the rst step, a protein-protein interaction (PPI) map was constructed by selecting high interaction scores through the STRING database. MirTarBase and miRNet databases were utilized to enrich the constructed PPI, including a signi cant cut-off to predict related miRNA and lncRNA, respectively (19)(20)(21). Then, a multilayer network comprised of (DEPs, miRNA, lncRNA) was constructed, which is visualized through Cytoscape software. The constructed network was evaluated based on two common graph theories, like topological parameters and high clustering coe cient, performed by network analysis and MCODE plugin, respectively (22). The criteria for module selection were as follows: MCODE score ≥ 2, degree cut-off = 2, node score cut-off = 0.2, max depth = 100 and k-score = 2.

Results
Dataset selection and quality assessment According to our consideration of colorectal cancer data selection, two main keywords (colorectal and colon cancer) were used to choose the appropriate dataset; nally, a protein expression pro le (PXD019504) was generated by Tanaka et al. was obtained from the ProteomeXchange database. The selected dataset comprises 18 rectal tissue samples from 9 tumors and nine none-tumors analyzed with a label-free approach. In the rst step, to transform the RAW data to the protein knowledge table, MaxQuant software integrated with Andromeda Search engine was applied with the parameters mentioned earlier based on Target decoy strategy. Given the importance of quality control assessment as an indispensable step in omics data analysis (23), the quality control appraisal performed via PCA, a wellknown unsupervised classi cation approach, can be used to demonstrate the dataset quality. To determine the satisfying quality of the used dataset, case and control clustering are the best sign. It is omitted as an outlier sample concerning the unsatisfying separation of one tumor sample (Figure 1a). To con rm the quality of samples, hieratical clustering was performed, which showed the satisfying quality of the dataset (Figure 1b).
Afterward, all identi ed proteins were analyzed consistent with FDR<0.05 to recognize the differentially expressed proteins (DEPs) using Perseus. According to the selected signi cant threshold, 1211 DEPs were detected, illustrated by the volcano plot ( Figure 2).

Enrichment analyses of DEPs
To explore the potential biological function of detected DEPs, the GO enrichment analysis was carried out utilizing ClueGO plugin. The obtained results indicate that the alteration in the biological process were considerably related to the metabolic process, more speci cally mitochondrial-related pathways, also extracellular matrix organization. Moreover, enriched parent terms in the molecular function category elucidated the involvement of cell adhesion molecule binding, oxidoreductase activity, and NAD binding as the utmost important altered function concerned with DEPs. In addition, changes in the cellular components section were mainly attributed to extracellular exosome, actin cytoskeleton modi cation like lamellipodia, and focal adhesion. Regarding Reactome pathway analysis, it can be observed that metabolic process pathways, especially mitochondrial-related pathways such as "TCA cycle," Respiratory electron transporter," as well as amino acid metabolism, are the signi cantly enriched terms that are in line with the above GO ndings. Remarkably, in agreement with the GO ndings, extracellular matrix organization is one of the main enriched terms based on DEPs. (Figure 3, Figure 4) The analysis of constructed multilayer network To detect the key driver of each added regulatory molecule like MiRNA and lncRNA affecting DEPs, a multilayer network comprised of DEPs-predicted MiRNAs and lncRNAs was constructed and was evaluated. Based on the graph theory concept, more speci cally the degree parameter, hub molecules of each layer were identi ed separately.
Due to the affection of miRNA expression through lncRNAs function, also in accordance with the crucial role of lncRNAs in transcriptional and post-translational regulation, we were interested in nding the central molecule of this regulatory layer. So, we identi ed the top node of the lncRNAs layer among 150 curated lncRNAs which is shown in (Table1). Among all identi ed lncRNAs, KCNQ1OT1 was detected as a hub molecule with the highest interaction with miRNA; moreover, NEAT1 and XIST are other predicted lncRNAs in the top-ranked with the highest degree parameter.
Notwithstanding degree criteria, the high clustering coe cient is another indispensable graph theory approach that could be employed to introduce the central driver of biological networks; depending on the mentioned criteria, two modules were recognized in the constructed multilayer network and their related Reactome pathway enrichments identi ed ( Figure 5) It is indicated that "mitochondrial related pathways" and "respiratory electron transport" are the essential altered pathways associated with detected modules.

Discussion
In the face of the advancement of high-throughput approaches and improving the basic knowledge in biological areas, re-analyzing the pre-existed datasets in all sorts of big biological data could be a valuable method to provide a comprehensive insight into underlying mechanisms of the complex disorders. Furthermore, as a trend of biological issues, the construction of a multilayer network could be a valuable assistance in making the best picture of the disease's pathogenesis (8). In the present study, we carried out a re-analysis of a proteomic dataset. After ensuring the quality of the selected dataset, an integrated network was constructed, composed of essential regulatory elements such as predicted MiRNAs and lncRNAs alongside the calculated DEPs. In the next step, through employing well-known systems biology approaches, pivotal biomedical phenomena and key-role player molecules in each added layer were identi ed probably have a signi cant role in colorectal cancer development. In accordance with the ndings in enrichment analysis, the identi ed DEPs were mostly enriched in extracellular matrix organization and metabolic process, speci cally the mitochondrial related activities like respiratory electron chain and TCA cycle. In agreement with the above ndings, the identi ed hub molecules amid DEPs can be divided into two classes which underscored the modi cation of "extracellular organization" and "mitochondrial dysfunction." According to mentioned pathways, the classi ed high degree proteins are involved in extracellular matrix organization were included ITGB1, ACTG1, and PLEC. Subsequently, a bunch of evidence has been designated the pivotal role of extracellular modi cation by modifying ITGB1 and PLEC in the CRC pathogenesis (24).
Moreover, it has been illustrated that ITGB1 is an important linker between the actin cytoskeleton and extracellular matrix (25,26). Notably, our ndings also revealed that ACTG1 is a considerable detected central protein, one of the prominent elements of the actin cytoskeleton known as a key role player in cancer development. Taken together, in line with the previous investigation coupled with considering the blunted pathogenesis mechanisms of colorectal cancer (27), the mediator elements of the extracellular matrix and actin cytoskeleton could be a lost piece of colorectal pathogenesis which further work needs to be done to establish an insight into the relationship between cytoskeleton modi cation and CRC prognosis.
Interestingly, in agreement with the key role players of extracellular matrix organization in the progression of the disease, cell adhesion molecule binding in molecular function with the highest number of child terms underscore the importance of these biomedical phenomena. Additionally, all the enriched terms in the cellular component in terms of the number of children determined the actin cytoskeleton related terms such as "lamellipodium" and "focal adhesion," which can be described as decisive elements in cancer cell migration and cancer metastasis that is previously explored in the bunch of studies as pivotal factor and potential therapeutic target (28). All in all, during the cancer progression, cell adhesion disorganization is the topmost alteration in metastasis and cellular transformation; despite this valuable nding, the exact mechanisms of extracellular matrix elements are still unclear, then deciphering the role of involved elements requires extensive research.
Inconsistent with the previous studies, our results underlined the mitochondrial dysfunction and related metabolic processes such as "TCA cycle" and "Respiratory chain electron transport "as other terms were signi cantly enriched for the identi ed DEPs. In this regard, most detected central proteins were concerned with the MRPS isoforms protein as critical protein for mitochondrial functionality. Remarkably, in agreement with the above results, a bundle of recent investigations have focused on the key role player of mitochondrial dysfunction in the pathogenesis of colorectal cancer (29,30). Additionally, in ful llment of the central ndings, the enrichment analysis results underscored the indispensable role of mitochondrial dysfunction and its related metabolic process in the progression of the disorder. Concerning the undeniable fact that cancer is a high energy demand process, such enrichment terms and proteins were predictable. Moreover, in spite of enriched terms about the mitochondrial-related process, we observed other metabolic processes such as "Branched-chain amino acids metabolism" "Fatty acid metabolism," which is noted in preceding colorectal cancer investigations. In this regard, the role of dietary intake of Branched-chain amino acids metabolism in colorectal cancer occurrence has been recently highlighted (31,32).
Involvement of the mentioned biomedical phenomena probably makes a deeper understanding of the underlying pathogenesis mechanisms of colorectal cancer. In the further step of our analysis, we constructed a multilayer network as the main purpose to identify the central molecule among the enriched regulatory layers such as MiRNAs and lncRNAs. The recognized key regulatory elements are listed in table 1.
miRNA is one of the enriched regulatory layers which plays a major role in the pathological aspect of colorectal cancer (33). Considering their regulatory role on protein expression, they can be chosen as a non-invasive biomarker, as indicated in the former investigation as an early detection marker of such complex disorders like colorectal cancer (34,35).
The topmost MiRNA includes miR-16-5p, miR-26b-5p, miR-92a-3p, miR-1-3p, and other detected hub MiRNA, listed in Table1. Remarkably, the majority part of hub miRNAs are indicated as a key-role player of CRC progression. In a recent survey, miR-16-5p was pointed out as a deceive element in tumorigenesis through targeting the SMAD3 in CRC (36). Additionally, Yang Li et al. indicated that miR-26b-5p is a suppressive factor that directly affects FUT4 expression, which in turn regulates metastasis of cancer cells (37).
In accordance with the fact that all identi ed central molecules have been previously demonstrated as a momentous part in the progression of CRC, it could be described that this regulatory layer should be noted mostly in this kind of complicated diseases as a potential detective biomarker and potential target remedy.
Another predicted regulatory layer of constructed network is lncRNAs which were shown as an outstanding biomolecular layer that plays a vital role in pre-and post-translational regulations through the impact on proteins and other layers such as miRNAs (38). In spite of the conducted investigations on the regulatory role of lncRNAs in the progression of colorectal cancer, it can be viewed that focusing on single molecules in this sort of layer cannot be applicable to decipher the exact underlying mechanisms of CRC. Regarding the existed bottleneck, in this study, we import predicted lncRNAs in the constructed network based on their regulatory role on the miRNAs to make a holistic insight without a bias to speci c lncRNA; through the network analysis, we recognized some topmost lncRNAs via degree parameters; notably, the majority part of detected hub lncRNAs have previously inspected in the progression of CRC. Accordingly, the critical role of KCNQ1OT1 in the development of colorectal cancer was determined by inducing aerobic glycolysis, which is designated its proliferative role (39). Interestingly, the pivotal role of NEAT1 was previously explored as a decisive element to the metastasis of cancer cells (40).
Additionally, it is elucidated as a diagnostic biomarker which can be literature validation about our gained ndings (41). In agreement with the previous investigations, all identi ed lncRNAs play a momentous part which led to the development of colorectal cancer through the binding with MiRNAs and proteins (42,43). Notably, amid the central lncRNAs, some of them, such as HELLPAR, DNAAF4-CCPG1 are not yet surveyed on their involvement in CRC.
Furthermore, a proportion of them recently identi ed in colorectal cancer as DElncRNAs, such as PAX8-AS1which its role is still blunted, and further investigation is needed to decipher its function in the progression of CRC. Notably, between the identi ed and discussed lncRNAs, PCBP1-AS1 is another regulatory element that is recently noted in other cancer, according to their regulatory role on PRL-3 that is the main marker to the prognosis of colorectal cancer, which is contributed to promoting cell migration and invasion (44). Correspondingly, PCBP1-AS1 could be considered as a decisive element in the disease.

Conclusion
In conclusion, based on a holistic approach, this study adds a piece of evidence that favors metabolic process involvement, especially mitochondrial dysfunction and its related pathways in the pathogenesis of CRC. In accordance with the constructed multilayer network, which is comprised of (DEPs -MiRNAs -lncRNAs), several hub molecules in each layer were introduced like hub-DEPs (ITGB1, ACTG1, ...), hub-miRs (miR-16-5p, miR-26b-5p, ...), and hub-lncRNAs (KCNQ1OT1, NEAT1) which are agreed with the previous investigations based on that they are the key factor in the pathogenesis of CRC. Nevertheless, we announced some lncRNAs such as HELLPAR, DNAAF4-CCPG1 which are not explored in the CRC development which could be counted as potential non-invasive biomarkers in the disease. Taken together, all identi ed ndings, more speci cally introduced biomarkers, should be investigated in subsequent studies to clarify their role in the vague pathological mechanisms of CRC.   Gene ontology enrichment analysis: GO analysis in three categories were performed concerned with the identi ed DEPs, the obtained results were summarized as parent terms. The X-axis shows the number of child and the y-axis indicated the parent terms. Each section was depicted with different colors. 0/05 False discovery rate was considered as a signi cant threshold  Pathway enrichment analysis was performed on DEPs: the involvement of essential biomedical phenomena was recognized based on altered proteins. The amount of alteration in each pathway was depicted in the x-axis; also, the y-axis shows the selected statistically signi cant threshold.