DEGs, DMGs and gene mutations
A total of 1170 (433 upregulated and 737 downregulated), 1522 (603 upregulated and 919 downregulated) and 1437 (675 upregulated and 762 downregulated) DEGs were obtained in LS, CRC and endometrial carcinoma patients, respectively, and called DEGs-L, DEGs-C and DEGs-E. Figure 1A-C represents the gene expression in total and the heatmap of the top 50 DEGs according to the P value through cluster analysis. Cluster analysis involves grouping a set of characteristics in such a way that objects in the same group (a cluster) are more similar (in some way) to each other than to the objects in other groups. We found that there was a substantial difference in the 3 sets of DEGs between tumour tissue and normal tissue. Among the above 3 sets of DEGs, there were 653 specific genes for LS and CRC, 35 for LS and EC, and 252 for LS, CRC and EC (Figure 2A).
By analysing the methylation data, we acquired differentially methylated genes (DMGs). A total of 13085 DMGs in LS (DMGs-L), 18315 in CRC (DMGs-C) and 18801 in EC (DMGs-E) were obtained. There were 376 DMGs in both LS and CRC, 454 common DMGs in LS and EC, and 12136 overlapping DMGs among the three diseases (Figure 2B). As shown in Figure 2A-B, the number of specific DEGs in LS-CRC was significantly higher than that in LS-EC, while the number of specific DMGs in LS-EC was slightly higher than that in LS-CRC, indicating that the pathogenic processes connecting LS to CRC and EC might be different.
In the analysis of the mutation data, we counted the number and types of mutations in three patients (Figure 2C, D). Among the three diseases, missense mutations were the most common, followed by frameshift-del and nonsense mutations (Figure 2C). The differences in mutation type among the 3 diseases included that CRC had a higher proportion of frameshift-del mutations than EC and that the percentage of nonsense mutations was higher in EC than in CRC, illustrating that CRC and EC may occur and develop through different types of mutations.
We further counted the common mutations whose mutation frequency was greater than the median (median mutation of LS and CRC was 1 and EC mutation was 2). There were 111 high-frequency mutations in LS, 4563 in CRC and 2883 in EC. The common mutated genes in LS and CRC were screened, and the genes mutated in EC were excluded. Thus, specific high-frequency mutated genes in LS and CRC (Mut-LC) were obtained, with a total of 19 genes. Similarly, 11 common mutated genes were obtained in LS and EC (Mut-LE) (Figure 2D). The number of specific mutations in both groups was small.
Identification of DEGs and DMGs highly related to LS pathogenic genes
A total of 460 DEGs in both DEGs-L and DEGs-C (DEGs-LC) were obtained by analysing the correlation between DEGs and the 5 LS pathogenic genes, as well as 24 DEGs in DEGs-L and DEGs-E (DEGs-LE). In the three groups of DMGs, we obtained 15 specific DMGs (DMGs-LC) of LS and CRC and 64 specific DMGs (DMGs-LE) of LS and EC, which were highly correlated with 5 LS pathogenic genes.
Then, DEGs-LC, DMGs-LC and Mut-LC (DEGs-LE, DMGs-LE and Mut-LE) were combined, and duplicate genes were deleted. Ultimately, 494 specific genes for LS and CRC (SGs-LC) were obtained, including PTCHD1, SYT4, and COPDA1, and 99 LS and endometrial specific genes (SGs-LE) were obtained, including CDC20B, SLC10A4, and LY6K (Supplementary Table 1). The top 20 SGs-LC and SGs-LE according to P value are listed in Table 1. The numbers of SGs-LC and SGs-LE were quite different. Thus, it is speculated that LS might develop into CRC in more complex ways.
The enriched GO terms and pathways
After enrichment analysis, SGs-LCs were found to be enriched in 7 KEGG pathways and 106 GO terms, including 60 biological processes (BPs), 22 cellular components (CCs), and 24 molecular functions (MFs). The main enriched BPs were feeding behaviour, collagen catabolic process, synapse organization and regulation of appetite. Enriched CCs were extracellular space, plasma membrane, anchoring component of membrane, etc. Enriched MFs were hormone activity, neuropeptide hormone activity, serotonin-activated cation-selective channel activity and bile acid transmembrane transporter activity. Figure 3A shows the top 20 GO terms according to P value. The KEGG pathways of SGs-LCs were significantly enriched in serotonergic synapse, maturity onset diabetes of the young, PPAR signalling pathway and arrhythmogenic right ventricular cardiomyopathy (ARVC) (Table 2).
SGs-LEs were enriched in 12 GO terms, including 5 BPs, 6 CCs and 1 MF. The GO terms are shown in Figure 3B, among which peroxisome,,mitochondrion, protein transport ,cellular response to DNA damage stimulus and membrane had P values less than 0.05. The only KEGG pathway identified as enriched among the SGs-LEs was the peroxisome pathway, however, the enrichment was not statistically significant (P = 0.050, Table 2). There was a great difference in enriched GO terms and pathways between SGs-LCs and SGs-LEs, and SGs-LCs were enriched in more pathways, indicating that LS might likely develop into CRC through more pathways, consistent with our previous speculation.
Protein-protein interaction networks
After STRING analysis, 663 interaction pairs between 280 proteins were obtained in SGs-LCs. These pairs formed 12 clusters and contained 66 genes (Figure 4A). SNAP25,SST,GCG and GABRG2 were involved in most pairs. Table 3A shows the top 20 genes with the highest degrees in the network. A total of 24 interactions were obtained from SGs-LEs. In the whole network, there were 3 clusters containing 11 genes with degree ≥ 2 (Figure 4B), of which KIF20A and NUF2 had the highest degrees (Table 3B). These genes could be potential key genes for the development of LS into CRC or EC.
Correlation between gene expression level and survival
In SGs-LCs, the genes with a significant difference in survival rate among LS patients with high expression levels and low expression levels were ELAVL3 (P = 0.013), ALPI (P = 0.020), GCGR (P = 0.020), HS6ST3 (P = 0.032), CNGB1 (P = 0.033) and RORB (P = 0.047) (Figure 5A-F). The expression levels of CA10 (P = 0.008), HTR4 (P = 0.028), NRAP (P = 0.031), CLDN19 (P = 0.041) , COL18A1 (P = 0.047), SMKR1(P=0.039) and TPH(P=0.0027) were significantly correlated with the survival rates of CRC patients (Figure 5G-M). In SGs-LEs, there was no significant difference in survival rates between LS patients with high and low expression of any SGs-LE gene. In EC, the genes with significantly different survival rates between patients with high and low expression levels were CDC45 (P = 0.015) , WDR31 (P = 0.024) and UQCRQ(P=0.037) (Figure 5 N-P). These genes might be key genes for the prognosis of patients with LS, CRC and EC.
Verification of the specific genes
MSI is caused by a defect in one of the MMR genes and is strongly related to tumorigenesis. The MMR genes (MLH1, MSH2, MSH6, and PMS2) are pathogenic genes in LS. Therefore, we first analysed the mutations in SGs-LCs and SGs-LEs in CRC and EC patients with MSI-H and MSS. COL11A1 is associated with malignancy in colorectal cancer. In this study, COL11A1 was identified in SGs-LC and exhibited one missense mutation, three frameshift mutations and one intron mutation in MSS CRC patients. In addition, the mutation profile of COL11A1 was very similar to that of MSH6, as both contained intron, missense and frameshift mutations, therefore, COL11A1 may be the key gene to distinguish MSI-H and MSS CRC patients.
Finally, we analysed the expression of SGs-LC and SGs-LG in normal and tumour samples in LS-, CRC-, and EC-related pathways through GSEA of GO gene sets. SGs-LE was not significantly different between normal and tumour samples. The differentially expressed genes in SGs-LC were enriched in 4 GO terms:HP_clinical_course,GOMF_ion_transmembrane_transporter_activity, GOBP_neurogenesis,GOBP_neuron_differentiatiation(Figure 6 A-D).