Construction of WGCNA co-expression module
We divided RCCC samples into cancer and normal group according to the notes of TCGA and the Principal Component Analysis (PCA) showed that the samples could be divided into two obvious clusters (Fig. 1A). Then, we analyzed differentially expressed genes in both groups and identified 6303 up-regulated and 3563 down-regulated genes (|Log2 Fold Change| > 1, p-value < 0.001). The Log2 of enrichment ratio and -Log10 of adjusted p-value were visualized in the volcano plot (Fig. 1B). Based on fold change, we selected 20000 genes with the most obvious absolute value change among the differentially expressed genes and performed weighted gene co-expression network analysis (WGCNA) in cancer and normal samples. According to the average connectivity of the network, we selected the appropriate soft threshold (beta) in cancer samples and normal samples respectively (Figure S1A and C), and detected the node connection number of scale-free network under the selected soft threshold (Figure S1B and D). The fractional-step algorithm is used to construct modules and merged sub modules with dissimilarity less than 0.3 (Fig. 1C and S2A), and 11 co-expression networks were finally obtained in cancer and normal group respectively (Fig. 1D and S2B).
We analyzed WGCNA results of two groups and found that there were some overlapping genes in different types of data modules. In order to further excavate meaningful genes, we performed cross analysis on the two groups data. Excluding C_Mod1 and N_Mod1 that cannot be clustered, we found that N_Mod2 and several cancer modules (C_M5, C_M4, C_M11, C_M1, C_M1, C_M2, C_M3) intersect significantly, suggesting that the expression pattern of these overlapping genes in cancer may have changed (Fig. 1E). Next, we analyzed the correlation between these six cancer modules and patient clinical survival time. The correlation coefficient in each pattern represented the relation between gene module and the clinical traits, which decreased in size from red to blue. Most notably, we found that there was a significant correlation between the green module (C_Mod4) and the patient clinical prognosis survival time (Fig. 1F). Next, we analyzed the correlation between green module genes and patient survival (PFI time and OS time), and the results showed most genes in this module have a strong correlation with patient survival (Fig. 1G and H).
Renal clear cell carcinoma patients with low expression of LINC00472 have poor prognosis
After determining the module which most closely related to patient survival, we analyzed the expression of lncRNAs in this module, and finally obtained five lncRNAs (LINC00472, LINC00152, LINC00271, LINC01503, LINC01510) with significant differences in expression (Fig. 2A-E). Next, we analyzed the relationship between the expression of these lncRNAs and patient survival. As shown in Fig. 2F-J and S2A-E, LINC00472, LINC00152, LINC00271 and LINC01510 had a strong relationship with the survival time of patients. LINC00271 is highly expressed in testis and has a relative low expression in kidney, while LINC00472 and LINC01510 are highly expressed in renal tissue. Previous studies have shown that LINC01510 and LINC00152 play important roles in RCCC, which indicating that our bioinformatics analysis is reliable and effective. But it is still not clear what function LINC00472 plays in RCCC, so we next carried out the study on the function and mechanism of LINC00472. We selected a group of samples from RCCC and adjacent tissues in GEO database (GSE40435), and analyzed the expression of LINC00472. The results also showed that LINC00472 decreased significantly in cancer tissues comparing with the adjacent tissues (Fig. 2K). In addition, we found that the level of LINC00472 decreased gradually with the progress of cancer stage (Fig. 2L). According to the results of bioinformatics analysis, we speculated that the deletion of LINC00472 may promote the development of RCCC.
Knockdown of LINC00472 expression increased cell proliferation and migration
We transfected HK-2, 769-P and Caki-1 cells with LINC00472 lentivirus interference plasmid and control plasmid respectively, and using RT-qPCR to detect the mRNA expression of LINC00472. The results showed that the expression of LINC00472 in cells transfected with lentivirus interference plasmid was significantly decreased (Fig. 3A-C). After puromycin screening, we obtained the HK-2 shLINC00472, 769-P shLINC00472 and Caki-1 shLINC00472 cell lines with LINC00472 stable low expression, as well as their corresponding control cell lines.
Subsequently, we detected the growth status of LINC00472 stable low expression cell line and control cell line by using cell counting kit-8 (CCK-8) and cell count. The CCK-8 results indicated that inhibition of LINC00472 significantly increased the cell proliferation (Fig. 3D-F). The results of cell count were consistent with CCK-8 (Fig. 3G and H). Moreover, we checked the level of proteins related to cell proliferation by using western blot, and the results showed that the levels of p21 protein which inhibited cell proliferation significantly decreased in 769-P shLINC00472 cell line comparing with control cell line, while the level of Cyclin D1 protein that promoted cell proliferation was significantly increased (Fig. 3I). Single cell clone formation experiment showed that inhibition of LINC00472 obviously enhanced the ability of cell clone formation of HK-2 and 769-P cell (Fig. 3J and K). Furthermore, Wound healing experiment and Transwell experiment demonstrated that knockdown of LINC00472 significantly increased cell migration in Caki-1 cell (Fig. 3L and M) and 769-P cell (Figure S4A and B). According to the above experimental results, we can determine that inhibition of LINC00472 promotes the cell proliferation and migration in RCCC cell.
Construction of LINC00472 knockdown and overexpression stable cell line by CRISPR-Cas9
Next, we constructed cell lines with stable knockdown and overexpression of LINC00472 by using CRISPR-Cas9. Different from the knockout of common coding protein genes, the frameshift mutation caused by unit point shearing has low effective for lncRNA knockdown.(22) And not only that, due to the long length of LINC00472, we could not completely remove it. Therefore, we decided to reduce LINC00472 expression level by removing the sequence of its transcription starting site (TSS) to affect its transcription. We designed two shear sites, one is in the promoter region before the TSS and the other is in the first exon after the TSS (Fig. 4A). After monoclonal screening and sequencing, we obtained the successfully sheared cell line HK-2 KO-PE and the control cell line HK-2 KO-NC. RT-qPCR results showed the expression of LINC00472 in HK-2 KO-PE cells was indeed downregulated (Fig. 4B). To obtain the cell line with stable overexpression of LINC00472, we inserted a CMV enhancer sequence into the promoter region (Fig. 4C). Similarly, after monoclonal screening and sequencing, we obtained the cell line with successful sequence insertion cell line HK-2 KI-CMV and control cell line HK-2 KI-CTR, and the RT-qPCR results indicated that LINC00472 was successfully overexpressed in HK-2 KI-CMV cells (Fig. 4D). Inserting or deleting a certain sequence of a gene usually has a great risk of changing the expression of adjacent genes (Fig. 4E). In order to determine whether our scheme will affect the expression of genes near LINC00472, we checked the mRNA level of adjacent gene LINC01626 by using RT-qPCR, and the results showed that the mRNA level of LINC01626 did not change in both HK-2 KO-PE cell line and HK-2 KI-CMV cell line comparing with the control cell line (Fig. 4F and G).
Overexpression of LINC00472 inhibited cell proliferation and enhanced intercellular connectivity
By inserting a CMV sequence, we obtained a cell line with high expression of LINC00472, and then we found an obvious and interesting phenomenon that is the cell morphology of LINC00472 overexpressed cell line changed obviously compared with the control cell line, and the cells no longer grew in dispersion, but grew in clusters (Fig. 5A). In addition, the results of western blot detection of cell proliferation related proteins and single cell clone formation indicated that overexpression of LINC00472 significantly inhibited cell proliferation and reduced the ability of clone formation (Fig. 5B and C). Previous results showed that the cells change to grow in clusters after LINC00472 overexpression, so we speculated that LINC00472 enhanced intercellular connectivity. Therefore, we detected the expression and distribution of CDH1 (also named E-cadherin), which could be used as a marker of intercellular adhesion. Immunofluorescence results indicated that the expression of CDH1 in HK-2 KI-CMV cells significantly increased and the distribution of CDHI among cells was more aggregated comparing with the cells of control group; On the contrary, the expression of CDH1 decreased significantly in HK-2 KO-PE cells comparing with control group (Fig. 5D). Moreover, we also detected the mRNA and protein level of CDH1 using RT-qPCR and western blot, and the results demonstrated that the level of CDH1 in LINC00472 high expression cell line was significantly higher than that in the control group, while it was the opposite in LINC00472 low expression cell line (Fig. 5E-G). In addition, we used monolayer cell permeability test to further reflect the degree of intercellular connectivity. The detection of FITC-Dextran fluorescence intensity showed that the cell permeability of LINC00472 high expression group was significantly lower than that of the control group, while the cell permeability of LINC00472 low expression group was distinctly higher than control group (Fig. 5H and I). The above experimental demonstrated that overexpression of LINC00472 could inhibit cell proliferation and significantly increase the intercellular connection.
LINC00472 is highly correlated with extracellular matrix and cell metastasis related pathways
To further analyze the function of LINC00472 in cells, we performed RNA sequencing (RNA-seq) on two pairs of cell lines (HK-2 KI-CTR and HK-2 KI-CMV, HK2 KO-NC and HK-2 KO-PE) constructed by CRISPR-Cas9. RNA-seq results indicated that the main item obtained from GO enrichment analysis is extracellular matrix (Fig. 6A). Reactome enrichment analysis also showed that the changed pathways mainly focus on the extracellular matrix organization and cell membrane, involving matrix metalloproteinases (MMPs) family activation, integrin cell surface interactions and so on (Fig. 6B). Remarkably, most of the functions and pathways obtained by enrichment analysis are related to cell metastasis.
Next, we divided TCGA RCCC samples into high expression and low expression groups according to the median expression level of LINC00472, and then analyzed the expression levels of MMPs family members, integrin subunit family members and cadherin family members in the two groups of samples. As shown in Fig. 6C, the expression level of most members of MMPs family in LINC00472 low expression group was higher than that in LINC00472 high expression group. Proteins in MMPs family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in cancer processes, especially cell metastasis. According to the analysis results, we can infer that the low expression of LINC00472 enhances the decomposition of extracellular matrix by MMPs family, and then promotes cell metastasis. In addition, the analysis of cadherin family proteins and integrin subunit family proteins showed that the expression level of most members of the two family in LINC00472 high expression group was higher than that in LINC00472 low expression group (Fig. 6D and E). These analyses further demonstrated that the high expression of LINC00472 could enhance cell adhesion and connection, thus reducing the ability of cell metastasis.
Besides, we calculated the differential genes of high and low expression groups and analyzed their functional enrichment. GO and KEGG analysis showed that the differentially expressed genes were still mainly enriched in the regulation of extracellular matrix and cell adhesion (Figure S5A-C). These results are highly consistent with the analysis results of RNA-Seq. In addition, GSEA enrichment analysis showed that high expression of LINC00472 promoted the extracellular membrane-bounded organelle entries and inhibited the proliferation of epithelial cells, while low expression of LINC00472 reduced cell junction assembly and increased the regulation of cellular component movement (Fig. 6F-I). The above analysis results showed that the high expression of LINC00472 enhances the adhesion and connection between cells, which is consistent with the previous cell experiment results.
ITGB8 is a target gene regulated by LINC00472
We analyzed the RNA sequencing data of HK-2 KO-NC/KO-PE and HK-2 KI-NC/KI-CMV, and found 317 genes with the same expression level trend (Fig. 7A). Through PPI interaction analysis of the 317 genes in STRING database, we found a sub network located at the hub of the network, suggesting that these network node molecules may play an important role in the function of LINC00472 (Fig. 7B). Next, we detected the expression level of these network node molecules by using RT-qPCR, and the results showed that the expression of integrin family protein ITGB8 significantly decreased after LINC00472 was inhibited (Fig. 7C and D), and distinctly increased when LINC00472 was overexpressed (Fig. 7E). In addition, western blot and immunofluorescence results also displayed that the ITGB8 level was significantly up-regulated in LINC00472 high expression cell line and significantly decreased in LINC00472 low expression cell line (Fig. 7F-H). Therefore, we speculated that ITGB8 may be a potential target gene regulated by LINC00472. We analyzed the correlation between LINC00472 and ITGB8 in RCCC and normal tissues on GEPIA website. The results showed that the Spearman correlation coefficient between LINC00472 and ITGB8 was 0.4, indicating that there was a correlation between them (Fig. 7I). Furthermore, we also analyzed the relationship between the expression of ITGB8 and the prognosis of RCCC patients. As shown in Fig. 7J, the prognosis of patients with low expression of ITGB8 was significantly worse than that of patients with high expression of ITGB8, showing the same trend as LINC00472. So, how does LINC00472 regulate the expression of ITGB8?
LINC00472 interact with P300 to regulate histone modification
In the introduction section, we mentioned that lncRNA can regulate gene expression, and its main ways of action include participating in the regulation of histone modification and affecting the binding of miRNA to target genes. It is noteworthy that there are two obviously enriched histone modification peaks at the ITGB8 transcription start site (TSS) in the ChIP-Seq database that is H3K27ac and H3K4me3 (Fig. 8A, Figure S6A and C). Then, we designed six pairs of primers on upstream and downstream of ITGB8 TSS, and detected the levels of H3K27ac and H3K4me3 by using Chromatin Immunoprecipitation. The results showed that inhibition of LINC00472 significantly reduced the level of H3K27ac near the ITGB8 TSS (Fig. 8E and Figure S6D), while the level of H3K4me3 did not change (Fig. 8C and Figure S5B). Conversely, overexpression of LINC00472 obviously increased the level of H3K27ac near the ITGB8 TSS, while H3K4me3 not (Fig. 8D and F). The above results demonstrated that LINC00472 can regulate the level of H3K27ac near the ITGB8 TSS. Therefore, we speculate that LINC00472 is involved in the regulation of histone acetylation modification.
Subsequently, we analyzed the distribution of LINC00472 by RNA fluorescence in situ hybridization (FISH) and found that it was distributed both in the nucleus and cytoplasm, which providing a spatial possibility to participate in the regulation of histone modification. We checked the total level of H3K27ac in HK-2 and 769-P cells using western blot and the results demonstrated that the level of H3K27ac in LINC00472 low expression cell lines (HK-2 shLINC00472 and 769-P shLINC00472) significantly reduced comparing with the control cell lines (HK-2 shNC and 769-P shNC); On the contrary, the level of H3K27ac in LINC00472 high expression cell line (HK-2 KI-CMV) obviously increased comparing with the control cell line (HK-2 KI-CTR) (Fig. 8J). These results indicated that the deletion or overexpression of LINC00472 changed the total level of H3K27ac in cells. RPISeq provides two different methods (random forest, RF; support vector machine, SVM) to evaluate the possibility of protein binding to RNA. Using the algorithm support provided by RPISeq (http://pridb.gdcb.iastate.edu/RPISeq/) and lncPro (http://bioinfo.bjmu.edu.cn/lncpro/) database, we analyzed the possibility of interaction between histone modification participants and LINC00472. The results showed that LINC00472 had the highest correlation with P300 (Fig. 8H), a protein functions as histone acetyltransferase that regulates transcription via chromatin remodeling and is important in the processes of cell proliferation and differentiation.(23) Notably, RNA immunoprecipitation results indeed demonstrated that there was an interaction between LINC00472 and P300 (Fig. 8I). By analyzing the sequence and structure of LINC00472 and P300, we predicted the possible interaction sites between them (Fig. 8K). In addition, the results of RNA FISH and immunofluorescence also displayed that there was indeed spatial co-localization between LINC00472 and P300 (Fig. 8L). According to the above results, we can conclude that LINC00472 regulates the acetylation modification level of histone through interacting with P300.