Identication and validation of aberrant regulation of miR-445-3p /TTK , miR-140-5p/TTK and miR-133b/CDCA8 in small-cell lung cancer via bioinformatics analysis

Background: Small-cell lung cancer (SCLC) remains the leading form of malignant lung cancer, but little bioinformation on SCLC is available. This study explored the molecular targets of SCLC by evaluating differentially expressed genes (DEGs) and differentially expressed microRNAs (miRNAs) (DEMs). Methods: Five mRNA expression proles and two miRNAs expression proles from Gene Expression Omnibus (GEO) were downloaded. R software was utilized to analyze the DEGs and DEMs between SCLC and normal samples. The DEGs were analyzed via functional enrichment analyses and were used to construct protein-protein interaction (PPI) networks. DEM targets were then predicted and intersected with the DEGs. Furthermore, the hub genes of SCLC in the overlapping DEGs were analyzed in Oncomine. Finally, the expression of DEM-hub gene pairs were veried in tissues by RT-qPCR and Western blotting. Results: In total, 236 common DEGs and 104 common DEMs were identied. Functional enrichment analysis showed the DEGs were primarily enriched in ‘cell cycle’, ‘DNA replication’ and ‘oocyte meiosis’. Twenty hub genes and ve modules were identied from the PPI network. Furthermore, 6732 targeted genes of the DEMs were predicted. After intersecting with DEGs, 54 genes and 153 miRNA-mRNA pairs were eventually identied aberrant regulation in SCLC. MiR-445-3p/TTK, miR-140-5p/TTK and miR-133b/CDCA8 were identied as DEM-hub gene pairs. Oncomine analysis conrmed the overexpression of TTK and CDCA8 in SCLC. Further validation demonstrated that TTK and CDCA8 levels in SCLC tissue samples were markedly increased relative to normal controls, while miR-445-3p, miR-140-5p, and miR-133b levels were lower in SCLC samples than in controls. Conclusions: Our results revealed key miRNA-mRNA pairs associated with SCLC, providing new insights into potential disease targets.


Background
Small-cell lung cancer (SCLC) remains the most prevalent malignant lung cancer subtype, accounting for approximately 20% of lung cancers [1]. While sensitive to chemotherapy and radiotherapy, SCLC often becomes resistant to these therapies and undergoes systemic metastasis [2]. In addition, there are few targeted drugs for SCLC in the clinic. Thus, the e cacy of SCLC treatment has not improved substantially over the past few decades [3]. Due to the unique carcinogenic mechanism of SCLC and the di culty in obtaining samples, molecular mechanism studies as well as subsequent bioinformatics data of SCLC are also limited. Under these circumstances, further identi cation of the molecular mechanisms is essential to enhance SCLC patient treatment.
As next-generation sequencing technology advances rapidly, microarrays have been widely used in the study of tumor genes, molecular targets of antitumor drug therapy and prognosis monitoring [4]. The integrated Gene Expression Omnibus (GEO) database is an extensive public source of gene expression data, providing many cancer gene expression pro le datasets, including SCLC. In addition, recent studies have demonstrated that aberrant microRNA (miRNA) and mRNA regulation may lead to tumorigenesis [5,6]. MiRNAs, as noncoding endogenous regulatory RNAs, bind to target mRNA 3'-untranslated regions (UTRs) to inhibit translation [7]. MiRNAs play an indispensable role in regulating proliferation, survival, development, apoptosis, pathogenesis resistance, and tumorigenesis [8][9][10]. Several reports have found that miRNAs are related to the pathogenesis of SCLC. For instance, miR-25 has been revealed to be overexpressed as a carcinogenic regulator in SCLC by targeting cyclin E2 [11]. MiR-34b-3p and miR-27a-5p signi cantly inhibited the progression of SCLC by regulating their target genes [12]. Since the regulatory system between mRNAs and miRNAs plays a complex role in biological functions [13], comprehensively analyzing differentially expressed genes (DEGs) as well as miRNAs (DEMs) in multiple datasets from GEO will shed light on their potentially crucial molecular mechanisms.
Though a previous study has explored DEGs and DEMs in SCLC [14], the sample size of the study was small, and only one dataset each for DEGs and DEMs was analyzed, which might have led to false positive results. In addition, to our knowledge, no bioinformatics studies of SCLC have validated the results in tissue. In the present study, with the larger sample size, the RobustRankAggreg package was used to eliminate batch differences as a means of improving precision. Furthermore, the overlapping genes between DEM-target genes and DEGs were extracted, and the most critical miRNA-mRNA pairs were selected for validation in tissue.

Expression prolife data
The GEO (http://www.ncbi.nlm.nih.gov/geo) database was used to collect expression data of genes and miRNAs. A total of ve transcription pro les, GSE1037, GSE6044, GSE11969, GSE43346, and GSE108055, and two noncoding pro les, GSE19945 and GSE74190, were downloaded. The characteristics of all the mRNA and miRNA datasets are presented in Table 1.
Screening DEGs and DEMs R (v 3.5.1; https://www.r-project.org/) with the limma package was used to screen relevant DEGs and DEMs. Original data were downloaded if the standardized data were not available. Log 2 conversion was performed for values that were not described as logarithms. The affy package was utilized to read CEL le expression data, whereas limma was employed to obtain the DEGs and DEMs from each dataset. The common DEGs were identi ed by integrating each DEG in the ve mRNA datasets using the RobustRankAggreg package. The common DEMs were obtained by intersecting the two miRNA datasets.
The included DEGs and DEMs met the criteria of adjusted P<0.05 and |log 2 fold change| >1.0.

Enrichment analysis
GO and KEGG enrichment analyses were performed with the ClusterPro ler package of R software. GO enrichment analysis is widely used to investigate gene function, with GO covering three main ontologies: molecular function (MF), cellular component (CC), and biological process (BP). KEGG is a database widely used for systematically analyzing high-level gene functions. The cutoffs for signi cance were both adjusted to p <0.05 and q <0.05.

Protein-protein interaction (PPI) network construction
The STRING tool (https://string-db.org/) was utilized to assess interactions among the DEGs and identify hub genes. Interactions with an interaction score >0.9 were selected. We excluded disconnected nodes in the network. Signi cant network modules were screened with Cytoscape (v3.6.1; http://www.cytoscape.org/) software with the MCODE plug-in (degree cutoff = 2, node score cutoff = 0.2, k-core = 2, and max. depth = 100). Functional enrichment analysis was also performed for the genes in the signi cant modules.

DEM-DEG pair identi cation
FunRich software (version 3.1.3; www.cytoscape.org) was used to predict DEM target genes [15]. The overlapping genes between DEGs and the predicted genes of the counter-regulated DEMs with the same regulatory alterations were extracted using the Venny online Tool (version 2.1; http://bioinfogp.cnb. csic.es). Cytoscape software was utilized to illustrate and visualize the miRNA-mRNA regulatory network.

Cross-database validation via Oncomine analysis
Oncomine (oncomine.org) is considered to be the world's largest database of oncogene chips and comprehensive data mining platforms [16]. The hub genes among the overlapping genes between the DEGs and DEM-target genes were selected for veri cation in the Oncomine database. Oncomine data corresponding to expression value data (the preprocessed expression level was log 2 normalized and median-centered) of normal and SCLC tissues were obtained. GraphPad Prism software (version 8.2; GraphPad, Inc.) was used to create box diagrams, and data were compared by Student's t-tests with P<0.05 as the signi cance threshold.

SCLC patient samples
The Ethics Committee of The First Hospital of Jilin University (Changchun, China) approved the present study with the approval document number 2019-279. In total, 5 pairs of tumor and paracancerous tissues were obtained from SCLC patients at The First Hospital of Jilin University between January 2019 and January 2020. All patients provided informed consent before sample collection.

RT-qPCR
TRIzol (Invitrogen; Thermo Fisher Scienti c, Inc. Waltham, MA, USA) was used for extracting sample RNA based on manufacturer's instruction. Reverse transcription (RT) of miRNA was conducted using a PrimeScript RT Kit (Thermo Fisher Scienti c, Inc.), and a SuperScript II cDNA Conversion Kit (Thermo Fisher Scienti c, Inc.) was used to prepare cDNA. RT-qPCR ampli cation of the cDNA was performed subsequently. GAPDH and U6 snRNA were used to normalize mRNA and miRNA expression, respectively. SYBR Green Realtime PCR Master Mix was used for RT-qPCR with appropriate primers ( Table 2) using an ABI 7500 Fast Real-Time PCR System. The 2 ΔΔCt approach was used for assessing relative gene expression.
Western blotting RIPA buffer (Beyotime, Shanghai, China) was used for extracting protein prior to the use of a BCA kit for measuring protein concentrations (Beyotime). Protein samples of the same quantity were separated using SDS-PAGE prior to transfer onto PVDF membranes. The membranes were then blocked with 5% nonfat dry milk and incubated with anti-TTK primary (ab187520; 1:2,000; Abcam, Cambridge, UK) and anti-CDCA8 (ab74473; 1:1,000; Abcam) at 4 °C overnight. Appropriate HRP-conjugated secondary antibodies were then used to probe blots. Enhanced chemiluminescence (ECL; Thermo Fisher Scienti c, Inc.) was then employed to detect proteins. The relative intensity was calculated by correcting for GAPDH (5174; 1:1000; CST, Boston, Massachusetts, USA) with ImageJ software. 'condensed chromosome', and 'cyclin-dependent protein kinase holoenzyme complex'. Regarding the MFs, the DEGs were enriched in 'cyclin-dependent protein serine/threonine kinase regulator activity', 'longchain fatty acid binding', 'tubulin binding', 'water channel activity' and 'water transmembrane transporter activity'. The top 10 signi cant functions of the DEGs in the BP, CC, and MF categories were determined (Fig. 3a-c).
PPI network analysis STRING was employed to prepare the PPI network containing 154 nodes and 695 edges (Fig. 4a). Based on the analysis results of the Cytoscape analysis, twenty DEGs with the highest degree of nodes were identi ed as hub genes (Fig. 4b). The MCODE plug-in was used to calculate ve modules, among which the most signi cant module comprised 46 nodes (Fig. 4c) and was signi cantly enriched in pathways including 'oocyte meiosis', 'progesterone-mediated oocyte maturation', 'cell cycle', and 'p53 signaling pathway' (Fig. 4d).

MiRNA-mRNA regulatory network
To further explore the miRNA-mRNA regulatory network in SCLC, a total of 6732 DEM-target genes were predicted by FunRich software. After intersecting with common DEGs, there were 54 genes and 153 miRNA-mRNA pairs nally identi ed aberrant regulation in SCLC (Fig. 5a). Importantly, two hub genes (TTK and CDCA8) of SCLC were found among the intersecting genes, and miR-445-3p/TTK, miR-140-5p/TTK and miR-133b/CDCA8 were identi ed as DEM-hub gene pairs. The predicted binding sites of the three DEM-hub gene pairs are shown in Fig. 5b.
Validation of expression on the Oncomine database TTK and CDCA8 were selected as hub genes among the overlapping genes for cross-validation in the Oncomine database. The Oncomine analysis indicated that TTK and CDCA8 were overexpressed in multiple cancers, with signi cantly increased expression (P< 0.05) in lung cancer samples in six and ten datasets, respectively (Fig. 6a). We further queried and downloaded the SCLC datasets containing TTK or CDCA8 in the Oncomine database. The analysis indicated that TTK and CDCA8 levels were elevated in SCLC tissues relative to normal tissues (P<0.001; Fig. 6b and c).

RT-qPCR and Western blotting veri cation results
Two mRNAs (TTK and CDCA8) and three miRNAs (miR-455-3p, miR-140-5p and miR-133b) were veri ed in tissue samples. TTK and CDCA8 were markedly upregulated in SCLC tissues versus controls, while miR-455-3p, miR-140-5p, and miR-133b were markedly reduced in SCLC tissues relative to controls (Fig.  7a). The validation results of Western blotting were consistent with those of RT-qPCR, con rming that TTK and CDCA8 levels in SCLC tissues were increased relative to those in normal tissues at the protein level (Fig. 7b).

Discussion
SCLC is a rather malignant lung cancer that exhibits a low degree of differentiation, grows rapidly, has high vascularity, and undergoes early extensive dissemination, with extremely poor associated survival [17]. Those with extensive disease have a survival of only eight to thirteen months, with a two-year survival rate of approximately 5% [18]. The rst-line therapy of SCLC has not changed for decades in the clinic, and an effective therapeutic option for such recurrent-prone disease is still lacking [18,19]. Although numerous studies have indicated that several molecules regulate the progression of SCLC, the underlying carcinogenesis mechanism remains unclear.
Herein, through the integrated analysis of SCLC, we detected 236 common DEGs and 104 common DEMs and conducted functional enrichment analysis of the former, revealing these DEGs to be signi cantly enriched in multiple signaling pathways, such as 'cell cycle', 'DNA replication', 'human T-cell leukemia virus 1 infection', 'oocyte meiosis', and 'p53 signaling pathway'. The PPI network ltered ve molecules and twenty hub genes, which were considered to be the key genes for the development of SCLC. Subsequent integrated analysis of the DEG regulatory pairs revealed a total of 54 overlapping genes between the DEM-target genes and the DEGs. Particularly, TTK and CDCA8 were identi ed as the hub genes among the 54 overlapping genes and were targeted by miR-455-3p, miR-140-5p, and miR-133b.
Thr/Tyr kinase (TTK) phosphorylates serine, tyrosine, and threonine residues [20,21]. TTK is important for mitosis as it in uences the precise segregation of chromosomes and the duplication of centrosomes [22,23]. The expression level of TTK changes dynamically in the cell cycle, and it increases during the G1/S cell cycle phase and peaks in the G2/M phase [24]. Thus, the expression of TTK is closely related to the cell cycle. It is worth mentioning that the cell cycle was also demonstrated as the most signi cant signaling pathway of SCLC in this integrated investigation. TTK has been well demonstrated as an oncogene, and the dysregulation of TTK is linked to several cancers, including neuroendocrine lung cancer MiR-140-5p, as a tumor-suppressor, has been extensively studied recently because it is involved in the tumorigenesis of multiple kinds of tumors, including gastric cancer, breast cancer, colorectal cancer, etc.
[38 -40]. Furthermore, numerous studies have been performed regarding miR-140-5p in lung cancer; in one study, miR-140-5p was shown to repress the proliferation of NSCLC cells through the MMD/Erk signaling pathway [41]. Yang et al. revealed that miR-140-5p regulates the invasion and migration of NSCLC by targeting VEGFA [40]. The miR-140-5p target genes can serve as biomarkers of NSCLC, contributing to diagnosis and prognostic prediction [42,43]. Nevertheless, to the best of our knowledge, neither miR-455-3p nor miR-140-5p has yet been identi ed as associated with SCLC, and we speculate that the aberrant regulation of TTK by miR-455-3p and miR-140-5p may be an underlying regulator of the pathogenesis of SCLC.
Human cell division cycle associated 8 (CDCA8) was another hub gene among the overlapping DEGs. As a chromosomal passenger complex component [44], CDCA8 upregulation is related to the carcinogenesis of a variety of tumors [45,46]. CDCA8 has been proven to be a tumor promoter that is overexpressed in various kinds of tumors and is essential for cancer cell survival and malignancy [47]. High expression levels of CDCA8 are associated with the development and poor survival of malignancies such as breast cancer, osteosarcoma, and melanoma [48][49][50]. Furthermore, Bidkhori G et al. indicated that CDCA8 is linked with the cell cycle progression of lung adenocarcinoma [51].

Hayama S et al. demonstrated that CDCA8 is phosphorylated and coactivated by AURKB in NSCLC cells and that phosphorylated CDCA8
contributes to the survival and growth of NSCLC cells [52]. Thus, CDCA8 is considered a novel diagnostic and therapeutic target of great promise.
Although there have been several studies on CDCA8, the role of CDCA8 in SCLC has not been explored.
The present study revealed that although CDCA8 was overexpressed in SCLC, the expression of miR-133b, which regulates CDCA8 expression, was signi cantly decreased.
MiR-133b, as a particular member of myomiRs, was originally thought to be muscle-speci c and played a key regulatory role in muscle development and remodeling [53]. However, more recent work shows that miR-133b is downregulated in various cancers, indicating that it is closely linked to oncogenesis [54]. In the eld of lung cancer research, Crawford M et al. rst reported miR-133b underexpression in lung adenocarcinoma and found that it targeted the Bcl-2 family to inhibit tumors [55]. Furthermore, Liu et al. reported that the expression of miR-133b was decreased in NSCLC tissues and that miR-133b suppressed the development of NSCLC through targeting EGFR [56]. Lin et al. showed that miR-133b reduces cisplatin resistance in NSCLC by targeting GSTP1, and its overexpression suppresses the invasion and malignant growth of cisplatin-resistant NSCLC cells [57]. Nevertheless, the expression level and carcinogenic mechanism of miR-133b associated with SCLC remains unclear. We can infer that miRNA-133b regulating CDCA8 may be another underlying mechanism of the pathogenesis of SCLC.
However, our work has limitations. As a bioinformatics analysis, the results are based on publically available data instead of laboratory experiments. Although this study veri ed the expression at the gene level and protein level, we did not conduct functional veri cation of the genes in vitro. In addition, due to the lack of publicly available data on the clinical information of SCLC patients, we did not further analyze the association between the identi ed genes and patient survival, tumor recurrence, tumor stage, sex, age, etc.

Conclusions
Our study integrated the expression pro les of mRNAs and miRNAs in SCLC and identi ed miRNA-mRNA pairs associated with SCLC. Importantly, TTK regulated by miR-445-3p and miR-140-5p, together with CDCA8 regulated by miR-133b, might be crucial to the molecular mechanisms of SCLC, providing new insights into potential targets for the therapy of this disease. Availability of data and materials The data for bioinformatics analyzed in this study were downloaded from the GEO database. The accession numbers of the mRNA microarray database are GSE1037, GSE6044, GSE11969, GSE43346, and GSE108055. The accession numbers of the miRNA microarray database are GSE19945 and GSE74190. Other data and materials in the current study are available from the corresponding author on reasonable request.