Construction of weighted gene co-expression network
The weighted gene co-expression network analysis was performed on the clinical sample genes of liver metastasis patients obtained from GEO public data sets and hospitals to show the construction process of the network. First of all, the data needs to be processed simply. In order to make the calculation results more accurate, 17628 genes were obtained by selecting the genes with 75% of the median absolute deviation of the standardized expression, and the deletion values were detected. Second, select the threshold. The higher the square of the correlation coefficient, the closer the network is to the distribution without network scale. When , the degree of fit is the highest, so the corresponding soft threshold is selected. It can also be seen that the average connectivity of the network increases with the decrease of the soft threshold, and the soft threshold is selected to be as small as possible, and the final soft threshold is 5 ( as determined in Fig. 1 ).
Thirdly, the weighted gene co-expression network is constructed by determining the soft threshold, and the co-expression modules are divided by dynamic cutting and module merging, as shown in Fig. 2. Finally, 44 gene modules are obtained. The upper part of the figure is the clustering tree of genes, and the following is the module based on similarity clustering. It can be seen that many genes exist in many blue and midnight blue modules.
In order to make the research results more accurate, the intrinsic genes were extracted from each gene module, used to calculate the adjacency between different gene modules, and the heat map was drawn to get Fig. 3. It can be seen from the figure that the 44 gene modules are divided into two parts, namely, the upper left corner and the lower right corner. Their internal adjacency is high, and there are few unrelated parts between gene modules, indicating that the correlation between gene modules is relatively good, which is of research significance.According to the Eq.(1) and Eq.(3) of co-expression similarity, it can be seen that the greater the cor value of the gene module and the gene module and other gene modules, the two modules have better research significance, that is, the two modules are strongly related. The and in the formula respectively represent the module gene. When i=j, it represents the similarity between the gene modules. When i≠j, it represents the similarity between two different gene modules.
Fourthly, the association between 44 gene modules and clinical traits was calculated. Since the public dataset and a small amount of clinical data were currently used, the total data were divided into three parts (liver metastasis of colon cancer, primary colon cancer, and normal colon tissue). The phenotypic data association diagram analysis was performed on the data of these three parts. The left column is colon cancer liver metastasis, the middle column is primary colon cancer, and the right column is normal colon tissue.
It can be seen from Fig. 4 that the positive correlation module of normal colon tissue is negatively correlated with the other two situations. There are many positive correlation common gene modules between liver metastasis and primary, and there are some positive correlation gene modules only related to liver metastasis. On the basis of this result, the genes related to colorectal cancer liver metastasis were selected for subsequent analysis.
GO functional enrichment analysis
Five gene modules that were only positively correlated with liver metastasis of colon cancer were selected and enriched based on the functional annotation of GO database as shown in Figure 5.In this figure, the abscissa represents the gene ratio, and the ordinate represents the biological process enriched by this type of gene. The larger the dot in the figure, the greater the number of genes enriched in the process.Specific enrichment results:In Fig. 5-(a), the genes of yellow-green module are mainly enriched in the regulation of lymphocyte activation, the regulation of cytokine production, the negative regulation of immune system process, T cell activation, neutrophil activation involved in immune response, neutrophil activation, regulation of cell activation, regulation of leukocyte activation and so on.In Fig. 5-(b), genes of brown modules are mainly enriched in small molecule catabolism, organic acid catabolism, carboxylic acid catabolism, cofactor catabolism, fatty acid catabolism, organic acid biosynthesis, carboxylic acid biosynthesis, steroid catabolism and so on.In Fig. 5-(c), the genes of orange4 modules are mainly enriched in transmembrane receptor protein serine and threonine kinase signaling pathways, ossification, negative regulation of transmembrane receptor protein fence / threonine kinase signaling pathway, negative regulation of cell growth factor stimulation, transforming growth factor beta receptor signaling pathway, ossification regulation and other biological processes.In Fig. 5-(d), the genes of the floral white module are mainly enriched in the main biological processes such as the regulation of actin filament tissue, extracellular structure tissue, regulation of tissue remodeling, renal epithelial develop-ment, renal tubular development and so on.In Fig. 5-(e), the genes of midnight blue module are mainly enriched in biological processes such as RNA splicing, ncRNA treatment, neurotransmitter transport, antigen processing through MHC class 1, presentation of exogenous peptides, and ectodermal cell differentiation.
High Adjacent Subnetwork Data Mining
Network diagram is a commonly used visualization method in bioinformatics to show the correlation direction and correlation degree between different nodes.In enrichment analysis, network maps are often used to express the relationship between the function and the genes estimated to function [35].For protein protein interaction network, network subgraphs are often used to express the intensity of association, interaction and type of interaction between encoded genes.According to the above analysis, we can judge the importance of the gene node in the whole network by the number of connections or the strength of connections between one node and other related nodes [36], and then dig out the potential high-expression genes throughnetwork.
The 2197 genes in all the gene modules that were only positively correlated with liver metastasis of colon cancer were summarized. Since the correlation between genes in the fourth module was relatively low, it was impossible to make a high adjacency network, so the remaining 697 genes were selected.MCODE is used to mine high-adjacent sub-networks in gene co-expression networks (Fig. 6), and functional annotation of these sub-networks (Table 1) Only high-adjacent sub-networks are found in this paper. Because the high-adjacent value of MCODE5 is unstable, MCODE5 is not analyzed.The corresponding relationship between different color blocks and subnet numbers is shown in the lower right corner.The functional annotations of these sub-networks include class A/1 (rhodopsin-like receptors), peptide ligand binding receptors, Gα signaling results, positive regulation of nitric oxide synthase biosynthesis, and DTX3L-PARP9-STAT1 complex , Cell response to gamma interferon, cell adhesion molecules, immune response activation signal transduction, immune response regulation signal pathway, activation of immune response, CD28 family co-stimulation, antigen processing and foreign peptide antigen MHC class I presentation, GRB2: SOS provides connections with integrin MAPK1 signaling and other metabolic processes. Among them, STAT1and MAPK1 are highly related to cancer, and the others are related to metabolism and the micro-environment. STAT1,as an important protein connecting the signal transduction between cell membrane receptor and effector, plays a very important role. Studies have shown that this gene can play a role mainly by promoting cell apoptosis [37], negatively regulating cell cycle, inhibiting normal cell proliferation, inhibiting tumor angiogenesis, and weakening tumor migration and invasion [38].
As an important transmitter from the cell surface to the nucleus, MAPK1 gene plays an important role. It can not only regulate cell differentiation and growth, but also adapt to environmental stress, inflammation and other major cellular physiological processes. According to the mechanism of the signaling pathway, it can be understood that any protein functional problems in the signaling pathway can lead to serious diseases, and these diseases are generally associated with tumors [39].
From the previous analysis of the construction of gene co-expression network, we can find the gene modules related to liver metastasis, and then through experiments, we can get the genes that are only related to liver metastasis. At the same time, the biological function of these genes can be analyzed to know the timing of these genes. It has biological significance, so our research is meaningful.Through gene co-expression analysis and mining of potential high-expression genes, we can perform auxiliary diagnosis and prognosis evaluation of liver metastasis of colorectal cancer, and formulate corresponding targeted treatment schemes for patients.
Table 1 Function enrichment of sub-networks identified by MCODE software
Sub-network number
|
Function Note Number
|
Description of function annotation
|
Log10(P)
|
MCODE_1
|
R-HSA-373076
|
Class A/1 (Rhodopsin-like receptors)
|
-27.0
|
MCODE_1
|
R-HSA-375276
|
Peptide ligand-binding receptors
|
-26.9
|
MCODE_1
|
R-HSA-418594
|
G alpha (i) signalling events
|
-25.8
|
MCODE_2
|
GO:0051770
|
positive regulation of nitric-oxide synthase biosynthetic process
|
-10.5
|
MCODE_2
|
CORUM:7385
|
DTX3L-PARP9-STAT1 complex
|
-10.2
|
MCODE_2
|
GO:0071346
|
cellular response to interferon-gamma
|
-10.2
|
MCODE_3
|
Ko05323
|
Rheumatoid arthritis
|
-9.6
|
MCODE_3
|
Hsa05323
|
Rheumatoid arthritis
|
-9.4
|
MCODE_3
|
ko04514
|
Cell adhesion molecules (CAMs)
|
-8.6
|
MCODE_4
|
GO:0002757
|
immune response-activating signal transduction
|
-3.6
|
MCODE_4
|
GO:0002764
|
immune response-regulating signaling pathway
|
-3.5
|
MCODE_4
|
GO:0002253
|
activation of immune response
|
-3.5
|
MCODE_5
|
R-HSA-389948
|
PD-1 signaling
|
-15.6
|
MCODE_5
|
R-HSA-388841
|
Costimulation by the CD28 family
|
-13.1
|
MCODE_5
|
Hsa04514
|
Cell adhesion molecules
|
-11.4
|
MCODE_7
|
GO:0002479
|
HALLMARK COAGULATION
|
-7.7
|
MCODE_7
|
GO:0042590
|
Complement cascade
|
-7.6
|
MCODE_7
|
R-HSA-1236974
|
ER-Phagosome pathway
|
-7.6
|
At the same time, we found that KRAS, APC, FBXW7, PIK3CA, TP53were related to colorectal cancer liver metastasis from the clinical cases of our hospital. So we used these five genes and STAT1 and MAPK1 to do the adjacent self-network, and found that there is a correlation between each other, which has biological significance(Fig. 7).
For the genes associated with colorectal cancer liver metastasis (KRAS, NRAS, BRAF, PIK3CA) [40], a comparison of related genes in this experimental data set can be found(Fig. 8).The abscissa of each subgraph represents eighteen samples, the ordinate represents the gene value, the blue circle represents the data of colorectal cancer liver metastasis, the red box represents the primary liver metastasis, and the green triangle represents the gene value of normal colon tissue.Among the four discovered gene proteins, the changes in transcriptome data of colorectal liver metastasis relative to primary and normal colon tissues are not obvious.The transcriptome data of MAPK1 and STAT1 changed significantly compared with the genes related to liver metastasis of colorectal cancer. Therefore, the MAPK1 and STAT1 obtained in this study have good research significance.