Expression of SMAD protein family in pancreatic cancer
The expression of SMAD protein family members in human cancers at the mRNA level was analyzed by using the Oncomine online database. Analysis of expression differences between cancer and normal tissues according to the selected criteria, the results showed that there were 442, 458, 453, 459, 459, 448, 456, 389 independent studies in the database involving expressions from SMAD1 to SMAD9 (Figure 1). Interestingly, with the exception of a few SMAD genes that have increased expression in several specific cancers, SMAD protein family members have decreased expression in most cancers. In detail, SMAD1 expression increased in brain and cns cancer and lymphoma, the expression of SMAD5 increased in brain and cns cancer, colorectal cancer and kidney cancer, SMAD6 expression increased in esophageal cancer, and SMAD9 expression increased in brain and cns cancer., however, in the other cancer data, as shown in Table 1, except for testicular cancer, most members of the SMAD protein family have decreased expression, and there is no significant expression difference in other types of cancer.
In order to further determine the expression difference of SMAD protein family between cancer and normal tissues, the TCGA and GTEx database were used to jointly analyze the expression difference of SMAD protein family in 29 cancers, and a heatmap was drawn (Figure 2). The red box shows that the expression difference is statistically significant. And each gene is specifically expressed differently in cancer (Figures S1-S8). By combining the results of the Oncomine database, and using t test. Compared with normal tissues, it was found that the expressions of SMAD1, SMAD4, SMAD5, and SMAD7 were significantly different in Brain Lower Grade Glioma. In breast invasive carcinoma, SMAD9 expression was significantly different. There are significant differences in the expression of SMAD1 in Acute Myeloid LeukemiaL, significant differences in the expression of SAMD6, SMAD7 and SMAD9 in LUAD, and the expression of SMAD1 and SMAD7 in Lymphoid Neoplasm Diffuse Large B-cell Lymphoma There are significant differences in the expression of SMAD6 in Prostate adenocarcinoma, and significant differences in the expression of SMAD1 and SMAD7 in Testicular Germ Cell Tumors.
Prognostic analysis of SMAD protein family
To determine the prognostic values of the genes selected, Kaplan-Meier survival analysis was conducted on the genes selected above based on the clinical information in the TCGA database. In LUAD, SMAD6 (logrank p = 0.65, p (hr) = 0.66) cannot show an obvious correlation with Overall Survival (Figure 3A). Similarly, in all other cancers that have been analyzed, the differential genes for other SMAD protein family also showed the same negative results as SMAD6. However, in LUAD, both SMAD7 (logrank p = 0.0099, p (hr) = 0.01) and SMAD9 (logrank p = 0.0017, p (hr) = 0.0019) shown inFigure 3B.C showed positive results. The prognosis of SMAD7 and SMAD9 high expression groups were significantly better than that of low expression groups.
Clinical features of SMAD7 and SMAD9
In order to evaluate the clinical characteristics of SMAD7 and SMAD9, we extracted the expression data of SMAD7 and SMAD9 in TCGA in different types of lung cancer. they showed the same results as that from the combined analysis of TCGA and GTEx (Figure 4A). In LUAD, the expressions of SMAD7 (p = 1.76E-12) and SMAD9 (p = 1.64E-12) were reduced compared to normal tissues. However, as shown in Figure 4B.C.D, after analyzing their stage, gender, age and expression, it was found that the expression of SMAD7 has nothing to do with the stage, gender, and age. In SMAD9 (as shown in Figure 4E.F.G), although there are differences between Stage1 and Stage3 (p = 1.07E-2), there is no continuous difference, thus, the expression of SMAD9 is independent of the stages. on the other hand, SMAD9 expression was slightly higher in women than that in men (p = 2.07E-2). In addition, the expression of SMAD9 is higher in young patients, but it is worth noting due to the insufficient sample size of young patients (n = 12).
Immunohistochemical image validation screening results
In order to verify the different expression of the SMAD7 and SMAD9 in LUAD, we extracted relevant IHC images from the Human Protein Atlas. The results showed that in normal tissues, the expression intensity of SMAD7 was mainly strong and that of SMAD9 was median (Figure 5A,C), while in LUAD, the expressions of SMAD7 and SMAD9 were both reduced (Figure 5B.D) .
Differentially expressed genes (DEGs) in TCGA
The data of LUAD in TCGA were divided into two groups of high expression and low expression according to the target gene median, and the DEG was used to screen the gene expression data between the two groups with limma in R software. According to the grouping result, a total of 12 DEGs of SMAD7 and 57 DEG24 of SMAD9 were identified from the TCGA database, (Figure 6.A.B).
Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of differential genes
Cluster Profiler, org.Hs.eg.db, richplot and ggplot2 packages in R software were employed to analyze the functions of DEGs in LUAD. The results show that in GO enrichment, SMAD7 mainly participates in function of the regulation of blood vessel. SMAD9 is mainly involved in function of zymogen activation (Figure 7A). In the KEGG enrichment, SMAD7 is mainly involved in functions such as Protein digestion and absorption, and SMAD9 is mainly involved in functions such as the NOD-like receptor signaling pathway (Figure 7B).
Gene set enrichment analysis (GSEA) of the TCGA
Although DFGs have been used for GO and KEGG enrichment, it only screened for differential expressions and did not involve the degree and direction of differential gene expressions. Therefore, the function of SMAD7 and SMAD9 in LUAD was further analyzed by using GESA. In the GO and KEGG enrichment analysis results, SMAD7 positive mainly regulates processes such as CELLULAR RESPIRATION and RNA DEGRADATION, but SMAD7 negative mainly regulates processes such as REGULATION OF CELLULAR_RESPONSE and LEUKOCYTE TRANSENDOTHELIAL MIGRATION (Figure 8A.B). SMAD9 positive mainly regulates the processes such as CHRONIC INFLAMMATORY RESPONSE and GALACTOSE METABOLISM, while SMAD9 negative mainly regulates the processes such as LUNG ALVEOLUS DEVELOPMENT and GNRH SIGNALING PATHWAY (Figure 9A.B). These enrichment analysis results can better help us understand how SMAD7 and SMAD9 participate in the regulation of LUAD.
Network of DEGs protein-protein interactions (PPI)
The PPI network helps us to further explore the molecular mechanism of SMAD7 and SMAD9 in LUAD. the STRING network tool was used to analyze the identified DEGs. after hiding the disconnected nodes in the network for SMAD7 (Figure 10A), the PPI network of the DEGs consisted of 12 nodes and 15 edges. The top 5 of predicted functional partners are FGG, COL1A2, F2, SERPINC1 and LOX. Mainly through Platelet Aggregation, Common Pathway of Fibrin Clot Formation (FDR = 1.05E-10), Integrin cell surface interactions (FDR = 1.14E-9), Extra cellular matrix organization (FDR = 2.58E-09) and GRB2: SOS provides linkage to MAPK signaling for Integrins (FDR = 4.79E-6), which regulates the occurrence and development of LUAD. For SMAD9 (Figure 10B), the PPI network of the DEGs consisted of 54 nodes and 238 edges. The top 5 of predicted functional partners are FGB, HGF, F2, TRAF2 and OASL, which mainly involved in immune system (FDR = 1.05E-7 ), interferon alpha / beta signaling (FDR = 2.22E-7), cytokine signaling in immune system (FDR = 5.45E-07) and interferon signaling (FDR = 6.62E-6) , the finding shown that SMAD7 and SMAD9 is mainly regulate the occurrence and development of LUAD.
Identification of the expression of SMAD7 and SMAD9 in GSE43767
GSE43767 is a microarray study based on normal and LUAD patients. It includes data from 15 normal lung tissue samples and lung tissue from 69 LUAD patients. The samples were analyzed using limma package in R software. The results showed (Figure 11) that the expression of SMAD7 and SMAD9 was significantly reduced in cancer patients.
SMAD7 is an independent prognostic factor for LUAD
The clinical information of LUAD patients was extracted from the TCGA database, and some clinical information samples with some missing data were deleted, and the survival and survminer packages in R software were employed to analyze the data using the COX risk ratio model in with the expression of SMAD7 and SMAD9. It was shown (Table 2) that in SMAD7, both the results obtained based on univariate cox regression analysis (p = 1.42E-2) or multivariate cox regression analysis (p = 4.88E-4) are statistically significant (Figure 12A), indicating that SMAD7 can be used as an independent prognostic factor for LUAD. Unfortunately, SMAD9 cannot be used as an independent prognostic factor (p =0.086)(Figure 12B)