A global database reveals the essential genes for cervical cancer through a comprehensive analysis of GEO, TCGA, and GTEx
According to data from the World Health Organization (WHO) on global cancer incidence and mortality rates and global statistical analysis results from the Global Cancer Observatory database, cervical cancer (CC) ranks as the fourth most prevalent cancer in women, making it a substantial public health concern. Cervical cancer is one of the most prevalent cancers among middle-aged women in most countries [2, 48]. Hence, developing novel treatment strategies for cervical cancer is vital for improving the overall prognosis of patients.
To investigate the crucial genes involved in cervical cancer, we retrieved data from the GTEx database. We obtained transcriptome data for 31 diverse tissue types across genders (Fig. 1). Through differential analysis, we identified 1804 genes that exhibited differential expression in cervical cancer compared to the adjacent normal cervical tissues. This analysis used transcriptomic data from TCGA and GTEx (Fig. 2A). Furthermore, by utilizing the GSE63514 dataset, we obtained 510 genes that displayed differential expression in cervical cancer compared to normal cervical tissues (Fig. 2B).
Furthermore, we identified 1702 genes that exhibited a differential expression in cervical cancer compared to normal cervical tissue, using data from the GSE192804 dataset (Fig. 2C). The results demonstrate our successful acquisition of transcriptome data from the TCGA and GTEx databases and the retrieval of two microarray datasets from the GEO database. Furthermore, differential analysis has allowed us to individually identify distinct genes for each dataset.
WGCNA uncovers genes characteristic of cervical cancer: A thorough investigation of modules associated with the disease
To effectively identify disease-related gene characteristics closely associated with CC, we performed a weighted gene co-expression network analysis (WGCNA) using TCGA and GTEx databases. We obtained nine gene modules: black, blue, brown, green, grey, magenta, red, turquoise, and yellow. Among these modules, the turquoise module exhibited the highest proportion of gene importance (Fig. 3A). The correlation analysis results between the module genes and CC revealed a negative correlation between the turquoise module genes and CC. This result suggests that the turquoise module genes may exert inhibitory effects on cervical cancer (Fig. 3B). To identify potential overlap, we conducted an intersection analysis between 988 disease-associated genes extracted from the turquoise module and the differentially expressed genes identified from GEO, TCGA, and GTEx databases. This analysis led to the identification of six genes that intersected across these datasets (Fig. 3C).
Furthermore, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis on six overlapping genes. GO enrichment analysis showed that the differentially expressed genes in the CC samples were primarily enriched in biological processes, including tissue development, extracellular structure organization, and wound healing (Fig. 4A). The KEGG enrichment analysis revealed that the differentially expressed genes in the CC samples were primarily enriched in signaling pathways, including ECM-receptor interaction, Human papillomavirus infection, and Focal adhesion (Fig. 4B). The results above indicate that we identified six genes closely associated with cervical cancer through WGCNA co-expression and differential analyses.
Machine learning is vital in disease gene screening, specifically utilizing LASSO and random forest algorithms
Subsequently, the expression levels of the six genes in the GSE192804 dataset were extracted, and a multivariate Cox study with LASSO regression was conducted. This analysis identified five disease-associated genes, namely ACOX2, IL-24, SPP1, CRYAB, and ANKRD22 (Fig. 5A-B). In addition, we utilized the random forest algorithm to assess gene importance, identifying IL24 as one disease-associated gene (Fig. 5C). Finally, by performing the intersection, we identified a single gene associated with cervical cancer: IL-24 (Fig. 5D). Based on the results above, we have successfully identified the genes associated with cervical cancer.
hIL-24: A comprehensive study of the newly discovered multi-functional anti-cancer protein and its interaction with Siha cells
To further elucidate the impact of human IL-24 (hIL-24) on cervical cancer, an overexpression plasmid of hIL-24 (designated as pcDNA3.1 (+)-hIL-24) was constructed and subsequently transfected into Siha cervical cancer cells to evaluate its effects on their biological functions. MTT analysis revealed that Siha cells' average optical density (OD) value in the pcDNA3.1 (+)-hIL-24 plasmid group decreased to 1.0127 compared to the transfection reagent and empty vector groups. This finding highlights the crucial role hIL-24 in inhibiting the growth of Siha cells (Fig. 6A; Table 1).
Furthermore, cell migration experiments demonstrated an inhibitory effect of hIL-24 on the migration ability of Siha cells. The average number of migrated cells in the pcDNA3.1 (+)-hIL-24 plasmid group was 105, markedly lower than the other groups (Fig. 6B, Fig. 7A, and Table 2). Further invasive experiments confirmed that overexpression of hIL-24 substantially reduced the invasiveness of Siha cells. The invasive ability of cells transfected with pcDNA3.1(+)-hIL-24 plasmid was the lowest, with an average of 90.5 cells (Fig. 6C, Fig. 7B, Table 3).
Finally, flow cytometry analysis revealed an increase in the apoptotic rate of Siha cells upon treatment with hIL-24. The pcDNA3.1 (+)-hIL-24 plasmid group showed an apoptotic rate of 12.81% (Fig. 6D; Fig. 7C; Table 4). These comprehensive data strongly suggest that human interleukin-24 (hIL-24) inhibits the growth, migration, and invasion ability of Siha cells and promotes apoptosis in these cells. This result demonstrates the tremendous potential of hIL-24 in the treatment of cervical cancer.