RHO Guanine Nucleotide Exchange Factors Predict Prognosis of Non-Small Cell Lung Cancer: A Comprehensive Bioinformatics Analysis

Background: Conventionally, RHO GEFs are known as activators for RHO GTPases which promote tumorigenesis. However, the role of RHO GEFs in non-small cell lung cancer (NSCLC) remains largely unknown. Methods: A comprehensive bioinformatics analysis of protein structure, transcriptional expression, survival, methylation, mutation and gene-set enrichment data was performed using multiple databases. Results: Through the screening of 81 RHO GEFs for their expression profiles and correlations with survival, four of them are identified with strong significance for predicting the prognosis of NSCLC patients. The four RHO GEFs, namely ABR, PREX1, DOCK2 and DOCK4, are downregulated in NSCLC compared to normal tissue. The downregulation of ABR, PREX1, DOCK2 and DOCK4, which can be contributed by promoter methylation, is correlated with unfavorable prognosis. Moreover, the underexpression of the four key RHO GEFs upregulates MYC signaling and DNA repair pathways, leading to carcinogenesis and poor prognosis. Conclusions: The data unveil the unprecedented role of ABR, PREX1, DOCK2 and DOCK4 as tumor suppressor in NSCLC. The previously unnoticed functions of RHO GEFs in NSCLC will inspire researchers to investigate the distinct roles of RHO GEFs in cancers, in order to provide critical strategies in clinical practice. roles and mechanisms of RHO GEFs in Gene


Background
Lung cancer is one of the most common malignancies and represents a leading cause of cancer deaths (> 1 million annually) worldwide [1,2]. Despite therapeutic advances, the prognosis of lung cancer remains unfavorable, and more than half of the patients diagnosed with lung cancer die within one year [3]. Non-small cell lung cancer (NSCLC) constitutes approximately 85% of all lung cancers, of which lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) are the most common subtypes [3,4]. A comprehensive understanding of genetic alterations and associated mechanisms in the development of NSCLC is required to more effectively predict the prognosis of patients and identify druggable targets in cancer therapeutics. RHO guanosine triphosphatases (GTPases) are essential molecular switches involved in the regulation of numerous downstream pathways in various types of cancer [5,6]. The cycling between guanosine triphosphate (GTP)-bound (active) state and guanosine diphosphate (GDP)-bound (inactive) state contributes to the activation or inactivation of downstream effectors [7]. In the "on" state (GTPbound) of RHO GTPases, target proteins are recognized and subsequently a response is generated until the RHO GTPases are turned to "off" state (GDP-bound) [5]. Ras-related C3 botulinum toxin substrate 1 (RAC1), Ras homolog family member A (RHOA) and cell division control protein 42 homolog (CDC42), as the most important and extensively-studied members of RHO GTPases, have been identified to regulate actin cytoskeleton reorganization, migration, metastasis and promote the development of lung cancer [8][9][10][11].
The activity of RHO GTPases is principally regulated by three types of regulatory proteins, including guanine nucleotide exchange factors (GEFs) for activation, GTPase-activating proteins (GAPs) for inactivation and GDP-dissociation inhibitors (GDIs) for GDP-bound form stabilization [12]. GEFs catalyze GDP release and subsequently assist in its binding of GTP, converting the inactive RHO GTPases to their active states for further functioning [13]. As activators of RHO GTPases, RHO GEFs has attracted the attention of researchers in recent years. RHO GEFs can be divided into two distinct families: diffuse B-cell lymphoma (DBL) family and dedicator of cytokinesis (DOCK) family [14]. There are 70 members of DBL family GEFs and 11 members of DOCK family GEFs [14,15]. The altered expression or mutation of RHO GEFs has been identified in human cancers [16]. However, compared to the roles of RHO GTPase family members, the roles of RHO GEF family members in NSCLC remain largely unclear. Consequently, exploring the altered expression and relevant mechanisms of RHO GEFs leading to NSCLC development is significant.
Herein, we analyze the expressions of RHO GEFs and their correlations with clinical parameters in NSCLC patients. Among 81 members of RHO GEF family, four of the members, namely active breakpoint cluster region-related (ABR), phosphatidylinositol 3,4,5-trisphosphate-dependent Rac exchanger 1 (PREX1), dedicator of cytokinesis protein 2 (DOCK2) and dedicator of cytokinesis protein 4 (DOCK4) are selected for their observably altered expression and significant prediction values for the prognosis in NSCLC patients. Moreover, methylation and mutation profiles are analyzed to interpret the observed altered expression of the selected members. Potential mechanisms related to ABR, PREX1, DOCK2 and DOCK4 in NSCLC development are also investigated.

Materials And Methods 2.1 Phylogenetic analysis
The protein sequences of human DBL family RHO GEFs (70 members) and human DOCK family RHO GEFs (11 members) were obtained from UniProt (https://www.uniprot.org/). The phylogenetic tree was constructed by MEGA7.0 software using neighbor-joining method with default parameters and 1000 bootstrap replicates [17]. The final tree containing the information of domain organization was visualized by the online tool Evolview [18].

Oncomine database analysis
Oncomine (https://www.oncomine.org/) is an online mining platform by which the expression analyses on comparing the transcriptome data in most types of cancer with their corresponding normal tissues can be performed [19,20]. The mRNA expression levels of genes encoding RHO GEFS in NSCLC tissues, represented by LUAD and LUSC, as well as normal lung tissues were analyzed by Oncomine.
In this study, P-value < 0.05, fold change > 2.0 and top 5% gene rank were selected as the thresholds.
The resulting data were input into and visualized by Microsoft Office Excel 16.0 software (Microsoft Corporation, Redmond, CA).

Gene Expression Profiling
Interactive Analysis (GEPIA) database analysis GEPIA (http://gepia.cancer-pku.cn/) is an online tool containing differential expression analysis between tumors and normal tissues based on The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) data [21]. To date, TCGA has produced RNA-Seq data including 9736 tumor samples across 33 cancer types, as well as the data containing 726 adjacent normal tissues. The GTEx project contains RNA-Seq data for more than 8000 normal samples from unrelated donors.
GEPIA integrates the information from cancer genomics big data for end users. The GEPIA database was used to validate the transcriptional profiles of RHO GEFs in NSCLC patients.

UALCAN database analysis
UALCAN (http://ualcan.path.uab.edu/) is a web server using TCGA RNA-seq and clinical data from 31 cancer types [22]. This database provides a platform for identifying candidate biomarkers specific for tumor sub-groups. The UALCAN database was utilized to analyze the mRNA expression of normal tissues and NSCLC specimens from different sub-groups based on nodal metastasis status and tumor stages.

Kaplan-Meier Plotter database analysis
Kaplan-Meier Plotter (https://kmplot.com/) is an online database capable for assessing the effects of 54,000 genes on survival in 21 types of cancer. This system includes gene chip and RNA-seq data from Gene Expression Omnibus (GEO), European Genome-Phenome Archive (EGA) and TCGA database. In this database, data of lung cancer are available [23]. The patient specimens were divided into high expression and low expression groups using JetSet best probe set and auto-selected best cutoff [24]. Outlier arrays were excluded to control array quality [25]. Overall survival (OS) and first progression (FP) were analyzed in NSCLC patients. Log-rank P-value and hazard ratio (HR) with 95% confidence interval (CI) were calculated and displayed on the web server. The Kaplan-Meier Plotter database was used to evaluate the prognostic value of genes encoding RHO GEFs in NSCLC.

MEXPRESS database analysis
MEXPRESS (https://mexpress.be/) is an online data visualization tool for the visualization and integration of TCGA expression, DNA methylation and clinical data [26,27]. The MEXPRESS database was employed to investigate the promoter methylation status of selected genes in NSCLC specimens compared to normal tissues, and only CpG sites with statistically significant results were reported. 2.7 cBioPortal for Cancer Genomics (cBioPortal) database analysis cBioPortal (http://cbioportal.org) is a web resource for the exploration, visualization and analysis of cancer genomics data [28]. The datasets selected in our study were LUAD (TCGA, Firehose Legacy, containing 580 samples) and LUSC (TCGA, Firehose Legacy, containing 503 samples). The platform for methylation sequencing information was Illumina Human Methylation 450 (HM450). The correlation between the expression of the selected genes and DNA methylation in LUAD and LUSC was determined by cBioPortal. Moreover, the mutations of the selected genes and their relationship with gene expression were investigated using cBioPortal.

Gene Set Enrichment Analysis (GSEA)
GSEA is used for determining whether a defined gene set is expressed with significant differences under two different biological conditions [29]. The TCGA data regarding LUAD (n = 479) and LUSC (n = 501) patients were downloaded from the Genomic Data Commons (GDC) Data Portal website (https://portal.gdc.cancer.gov/). Subsequently, the patients were classified into two groups (high vs. low expression) for each selected gene respectively, and the cutoffs were determined as medians.
GSEA was conducted on the mRNA expression data of the selected genes using  In order to obtain an overview of the expression patterns of ABR in various tumor types and in the corresponding normal tissues, a body map is generated by GEPIA (Fig. 3A). Compared to normal lung tissue, the expression of ABR is lower in lung cancer. To further validate the downregulation of ABR in lung cancer, specific datasets in Oncomine analysis are illustrated (Fig. 3A). From Garber Lung dataset, the expression of ABR in LUSC is markedly lower than normal lung tissue (P < 0.001, Fold change − 2.435), while the datasets of LUAD with statistical significance are lacking in Oncomine database. Therefore, GEPIA database is used to confirm our results. From the GEPIA database analysis  (Fig. 4B).
In addition, lower DOCK4 expression is associated with earlier first progression in LUAD patients, but not in LUSC patients (Fig. 4H). The downregulation of PREX1 (Fig. 4D) Table.1.

Discussions
RHO GEFs are known for their critical roles as molecular switches in activating RHO GTPases, and therefore function as regulators for various diseases not limited to cancer [5,6]. ABR, as is indicated by its name, was first identified as a breakpoint cluster region (BCR)-related protein that shared high homology with BCR in human (68% amino acids identity) [34]. ABR maintains the normal reactivity of innate immune system, and the altered function of ABR can lead to the development of leukemia [35,36]. PREX1 was first discovered in the cytosol of neutrophils [37]. PREX1 is important in regulating ROS production, migration and chemotaxis of neutrophils [38,39]. Elevated expression of PREX1 has been associated with the development of melanoma, prostate cancer and breast cancer [40][41][42].
DOCK2 expression was initially deemed to be restricted to hematopoietic cells [43]. Although it is predominantly expressed in lymphocytes and hematopoietic tissues, recent research has revealed the tumor-promoting role of DOCK2 in lymphoma and colorectal cancer [44][45][46]. Distinct from the identification of ABR, PREX1 and DOCK2 in non-cancer cells, DOCK4 identification was initially reported in osteosarcoma cells, in which DOCK4 was deleted during tumor progression [47]. However, DOCK4 was also reported to promote breast cancer development and associated with bone metastasis [48,49]. To date, the roles of ABR, PREX1, DOCK2 and DOCK4 in lung cancer are largely unknown, and the underlying mechanisms remain to be explored.
Through the integration of data from multiple databases, our study demonstrates that ABR, PREX1, Gene expression and repression within cancer cells can be controlled by the epigenetic mechanisms of DNA methylation, which is the interaction between genes and phenotype without causing mutations in DNA sequence [50,51]. DNA methylation involves the covalent addition of methyl groups to the C-5 position of the cytosine rings, especially in a CpG dinucleotide [52]. In mammals, approximately 70% of the promoters are rich in unmethylated CpG [53]. Hypermethylation in CpG sites of promoters is typically associated with gene silencing in transcriptional levels [54]. In lung cancer, DNA hypermethylation of tumor suppressors represents a hallmark and an early event in tumorigenesis [55]. Our study reveals that NSCLC samples from TCGA database contain higher methylation levels in the promoters of ABR, PREX1, DOCK2 and DOCK4 genes compared to the normal samples. Moreover, in NSCLC specimens, elevated levels of promoter methylation are markedly associated with lower expression of ABR, PREX1, DOCK2 and DOCK4. Thus, DNA hypermethylation might contribute to the downregulation of ABR, PREX1, DOCK2 and DOCK4 in NSCLC, and the methylation profiles of the four key RHO GEFs may be novel biomarkers for lung cancer screening.
In addition to epigenetic alterations, gene mutations can also affect gene expression levels [56]. For the types of mutation for ABR, PREX1, DOCK2 and DOCK4 in NSCLC, the majority of them are missense and truncating mutations, which mean a change of a single amino acid into another and a change in the DNA that can shorten the protein, respectively. Our results illustrate that the expressions of ABR, PREX1, DOCK2 and DOCK4 in NSCLC are not correlated with mutation. It should be noted that loss-of-function or gain-of-function mutations can lead to potential inhibitory or tumorigenic effects [57]. From our analysis, a few mutations occur in the RHOGEF (DH), PH, DOCK-C2 (DHR-1) and DHR-2 domains, indicating that the critical functions of RHO GEF can be altered by the mutations, and this is yet to be confirmed.
Mechanistically, the downregulation of ABR, PREX1, DOCK2 and DOCK4 upregulates MYC signaling and DNA repair pathways, as is identified by GSEA using TCGA data. MYC oncogene encodes a transcription factor and triggers gene expressions in cancer cells [58]. MYC signaling is implicated in the pathogenesis of most human cancers, and its deregulation is correlated with poor survival of patients [59]. MYC activation is associated with many features of cancer, including protein synthesis, proliferation and altered cellular pathways [59,60]. MYC upregulation is detected in > 40% NSCLC, and it is related to the loss of cell differentiation and tumor progression [61,62]. Therefore, ABR, PREX1, DOCK2 and DOCK4 downregulation may lead to the development of NSCLC via upregulating the MYC protein expression and its downstream targets.
In clinical practice, chemotherapy and radiotherapy are gold standards for the treatment of patients with lung cancer, and they have largely prolonged the survival of patients [63]. The therapeutic effects of platinum-based chemotherapeutic drugs and ionizing radiation are achieved by DNA damage [64][65][66]. Nevertheless, the enhanced DNA repair mechanisms counteract the therapeutic benefits to patients, and thereby lead to poor survival of the patients. Hence, the downregulations of ABR, PREX1, DOCK2 and DOCK4 promote cancer development and lead to an unfavorable prognosis by activating MYC and DNA repair signaling pathways in NSCLC, while further in vitro and in vivo studies are necessary to further confirm their association. Moreover, agents targeting DNA damage repair mechanisms have shown promise in NSCLC clinical models [65]. These four key RHO GEFs are also promising as predicting biomarkers for the response to DNA-damage regimens.
In spite of the conventional perspective that RHO GEFs are activators for RHO GTPases, our study unexpectedly but reasonably reveals that the RHO GEFs: ABR, PREX1, DOCK2 and DOCK4 are tumor suppressors in NSCLC. In addition to the above findings, it should be noticed that the following evidence might also support our results. ABR contains a RHO GAP domain in addition to its RHO GEF domain, functioning as a dual RHO GEF/GAP [67]. Additionally, the prominent expression of ABR, PREX1 and DOCK2 in cells of the immune system suggests their correlations with lymphocyte infiltration via cytokine secretion in NSCLC tissues. Tumor-infiltrating lymphocytes can attack and eliminate tumor cells, and therefore contribute to a better prognosis of NSCLC patients [68,69]. The association between DOCK2 expression and lymphocyte infiltration has been identified in colorectal cancer, and their correlation in NSCLC remains to be confirmed [45]. Moreover, DOCK4 was initially identified as a tumor suppressor in osteosarcoma cells, and latter studies also indicate the tumor suppressive role of DOCK4 in ovarian cancer [47,70]. For future studies, researchers should not simply regard all RHO GEFs as tumor promoters on account of their involvement in activating RHO GTPases. Instead, the functions of RHO GEFs other than activating RHO GEFs in cancers should be noticed.

Conclusions
In conclusion, our study identifies that the downregulation of ABR, PREX1, DOCK2 and DOCK4 promotes NSCLC development, and is associated with unfavorable effects on the overall survival of patients. Therefore, they can serve as promising biomarkers to predict the prognosis of NSCLC