ARL4C is A Key Driver of Gastric Cancer: An Integrative Analysis of ARL Family Members

Background: As small GTP-binding proteins, ARL family members (ARLs) have been proved to regulate the malignant phenotypes of several cancers. However, the exact role of ARLs in gastric cancer (GC) remains elusive. Methods: The expression status, interactive relations, potential pathways and genetic variations of ARLs are analyzed by bioinformatics tools. Machine learning models and enrichment analysis are performed by R platform. The biological functions of ARL4C are demonstrated by in vitro and in vivo experiments. The nomogram is further constructed to validate the prognostic value of ARL4C for GC patients. Results: ARLs are signicantly dysregulated in GC and involved in various cancer-related pathways. Subsequently, machine learning models identify ARL4C as one of the two most signicant diagnostic and prognostic indicators among ARLs for GC. Furthermore, ARL4C silencing remarkably reverses the epithelial-mesenchymal transition (EMT) and inhibits the growth and metastasis of GC cells both in vitro and in vivo. Moreover, enrichment analysis indicates that TGF-β1 is highly correlated with ARL4C, while ARL4C-related genes are signicantly enriched in the TGF-β1 signaling. Correspondingly, we demonstrate that TGF-β1 treatment dramatically increases ARL4C expression, and ARL4C knockdown reverses TGF-β1-induced EMT possibly by inhibiting the expression of Smads, downstream factors of TGF-β1. Meanwhile, the coexpression of ARL4C and TGF-β1 worsens the prognosis of GC patients both in Kaplan-Meier analysis and nomogram model. Conclusion: Our work is of signicance for comprehensively understanding the crucial role of ARLs in the carcinogenesis of GC and the specic mechanisms underlying the GC-promoting effects of TGF-β1. More importantly, we uncover the great promise of ARL4C-targeted therapy in improving the ecacy of TGF-β1 inhibitors for GC patients.

essential for the 3D invasive growth of prostate cancer both in vitro and in vivo (9). ARL4A could interact with Robo1 to promote cell migration by activating Cdc42 (10). A recent study reveals that ARL13B can enhance the migration and invasion of breast cancer cells via controlling integrin-mediated signaling pathway (11). Moreover, ARL4C overexpression promotes the progression of glioblastoma (GBM) in vitro and in vivo and indicates the poor prognosis for patients with GBM (12). However, the clinical values and biological functions of ARLs in GC remain elusive.
In this study, we rstly comprehensively investigate the expression pro les, hallmark pathways, genetic alterations and clinical values of ARLs in GC. We nd that ARL4C and ARL13B are the most important diagnostic and prognostic indicators among ARLs for GC by machine learning models. Further in vitro and in vivo experiments demonstrate that down-regulation of ARL4C dramatically inhibits tumorigenesis and metastasis of GC cells. More importantly, we discover that ARL4C is highly related to TGF-β1 signaling pathway and mediates the TGF-β1-induced EMT, which is further con rmed by our discovery that the coexpression of ARL4C and TGF-β1 predicts worse survival for GC patients than ARL4C or TGF-β1, respectively.
Kaplan-Meier plotter database (http://kmplot.com/analysis/) was used to investigate the predictive effects of ARLs on the overall survival of GC patients (18). All the analyses were performed according to the guidelines of these databases.
Gene set variation analysis (GSVA) GSVA, a functional enrichment software, was utilized to estimate the variation of pathway activity over a sample population in an unsupervised manner (19). Hallmark gene sets and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway sets of ARLs in GC were carried out by "ggplot2" R package based on GSVA software. P < 0.05 and FDR < 0.05 were set as the screening standard.
Logistic Regression model construction and validation TCGA and GTEx datasets were obtained from UCSC database (https://genome.ucsc.edu) to perform the analysis of ARLs' diagnostic values on GC patients. The patients were randomly separated into training (75%) and validation cohorts (25%). Logistic Regression was carried out to identify the diagnostic biomarkers with statistical signi cance in the training cohort using the "glm" function in R platform. Moreover, we evaluated the ability of predicted diagnostic markers in differentiating the GC patients in validation cohort.
Least absolute shrinkage and selection operator (LASSO) Cox regression analysis GSE15459 cohort (tumor, n=200) in the Gene Expression Omnibus (GEO) was utilized to perform the prognostic analyses of ARLs in GC patients (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE34942). LASSO Cox regression analysis was performed using "glmnet" R package. Tuning parameter (λ) selection in the LASSO model used 10-fold cross-validation via minimum criteria. A λ value of 0.0379 was chosen (λ.min) according to 10-fold cross-validation.

Enrichment analysis
Pearson's correlation coe cient exceeding 0.5 indicates a good correlation between ARL4C and its coexpressed genes. We employed the "clusterPro ler" R package for Gene Ontology (GO) and KEGG enrichment analysis of genes co-expressed with ARL4C.

Kaplan-Meier analysis
We conducted Kaplan-Meier analysis of the effects of TGF-β1 and ARL4C on the overall survival of GC patients by R platform.

Nomogram construction and validation
Nomogram was built based on a multivariate Cox analysis using the "rms" R package. The predictive performance of the nomogram was then validated by Decision curve analysis (DCA) and calibration curves.

Clinical samples
The GC tissue microarray (ST-1503) was purchased from Xi'an Alenabio. In addition, 12 paired samples of primary GC and adjacent normal tissues were obtained from patients who had undergone GC surgery at Xijing Hospital of Digestive Diseases. All samples were clinically and pathologically veri ed.

Cell culture and transfection
The human gastric carcinoma cell lines (AGS, MKN45) were obtained from ATCC (Manassas, VA, USA) and maintained in DMEM medium supplemented with 10% fetal bovine serum and 1% penicillinstreptomycin solution. All cell lines were cultured at 37 °C in a humidi ed atmosphere containing 5% CO 2 .
The human shARL4C (NM_001282431) lentivirus was designed and constructed by GeneChem (Shanghai, China). siRNAs against human ARL4C were designed and constructed by GenePharma (Shanghai, China). The protocol was reported in a previous study (20). Immunohistochemistry (IHC) and immuno uorescence (IF) IHC was performed following the protocol from a previous study (20), while IF staining was performed according to the published protocol (21).

In vitro and in vivo assays
We performed cell proliferation, colony formation assay, 3D invasion assay and xenografts assay following the protocols in a previous study (20).

Statistical analysis
Student's paired t-test was used to determine statistical signi cance of differences between two groups. Each experiment was carried out at least three times. Statistical analyses were conducted using SPSS software (Version 21.0). Images were obtained using GraphPad Prism software (Version 7.0). The counting data were represented by frequency or percentage, and the measurement data were expressed as the mean ± standard deviation (SD). P < 0.05 was considered statistically signi cant.
Since small GTPase proteins commonly work synergistically as function hubs to regulate cell biological functions [21,22], we conduct network analysis of ARLs at the gene level using the GeneMANIA tool, and nd a large number of shared protein domains among ARLs (Figure 1c).
we further perform the correlation analysis of ARLs in GC using TCGA database. As shown in Figure 1d and Table S1: signi cant correlations are discovered among these genes. To explore the potential oncogenic pathways which ARLs are involved in, we analyze the correlation between the expression levels of ARLs and the activity of hallmark-related pathways using GSVA. As shown in Figure 1e and 1f, various hallmark pathways of cancer are signi cantly associated with the expression of ARLs, including CHOLESTEROL HOMEOSTASIS (7/22), MYOGENESIS (5/22), UV RESPONSE UP (5/22), P53_PATHWAY (5/22). Meanwhile, the expression levels of ARL10 (n = 22), ARL13A (n = 12), ARL5B (n = 10), ARL15 (n = 8), and ARL4C (n = 8) are correlated with a higher number of pathways (Table S2).

Genetic alteration analysis of ARLs in GC
To comprehensively understand the expression pro les of ARLs in GC, we analyze the genetic alteration of ARLs in GC. The chromosome status (GRCh38/hg38) showed in Figure 2a and Table S3 clearly displays the genomic locations of 22 ARLs, and we nd ARLs are unevenly distributed on different chromosomes. Furthermore, we conduct the exact genetic analysis using cBioPortal for Cancer Genomic. From the changes in protein structure of ARLs (mutation sites≥3), we nd that ARL13B has more mutation sites than others ( Figure 2b). Moreover, we discover varying degrees of genetic variation among the 22 ARLs (1.7% to 10.0%), and the mutation ratios of ARL4A, ARL13B and ARL16 are relatively higher, up to 10.0% ( Figure S2a). We further check the alteration frequency of ARLs (mutation ratio≥8%) in various GC types. As shown in Figure 2c, copy number ampli cation obviously contributes to the mRNA expression alteration of ARLs in different GC types. More interestingly, DNA methylation analysis demonstrates that there is a negative correlation between mRNA expression and DNA methylation for most ARLs (R≥0.3, P 0.05) (Figure 2d). A recent study showed that deregulation of ARL4C is due to hypomethylation in its 3'-UTR in lung squamous cell carcinoma (22). Therefore, we further investigate the speci c methylation sites of ARL4C using MEXPRESS tool and nd that DNA methylation status of cg24441922 and cg11509907 sites of the 3'-UTR is signi cantly negatively related to ARL4C mRNA expression ( Figure S2b and Table S4). Taken together, these results suggest that DNA methylation is also involved in the epigenetic regulation of ARLs.

The diagnostic and prognostic values of ARLs for GC
We further assess the diagnostic and prognostic values of ARLs in GC patients based on the TCGA, GTEx and GEO datasets. Firstly, we construct the Logistic Regression model to test the usefulness of ARLs in GC diagnosis. All samples are randomly separated into training (75%) and validation (25%) cohorts. All ARLs in the training cohort are identi ed and featured with nonzero coe cients by Logistic regression model. Then, diagnostic markers with high signi cance are selected using stepwise method ("both" method). As shown in Figure 3a, we identify nine ARLs as the potential diagnostic markers for GC. Moreover, we evaluate the ability of predicted diagnostic markers in differentiating the GC patients from the normal in validation cohort. The result suggest that our selected diagnostic markers have a high accuracy of prediction (Area Under Curve (AUC) = 0.929) ( Figure S3a).
Furthermore, we analyze the effects of ARLs on the overall survival (OS) of GC patients using the Kaplan-Meier (K-M) plotter. We observe that 9 ARLs (i.e., ARL1, ARL4C, ARL5A, ARL5B, ARL9, ARL13B, ARL15, ARL17A and ARL17B) are signi cantly related to patient prognosis ( Figure S3b). Thus, ARLs are of great signi cance for assessing prognosis for GC patients. Then, we further identify the key prognostic markers for the purpose of avoiding over tting of the predictive model with the minimum criteria s via conducting LASSO Cox regression model univariate and multivariate Cox regression models (Figure 3b and Figure  S3c),where eight ARLs (ARL1, ARL4C, ARL5C, ARL6, ARL13B, ARL14, ARL15 and ARL16) are selected that are reliably associated with OS. The univariate and multivariate Cox regression models are also undertaken to study the prognostic values of all ARLs for GC (Figure 3c and 3d). As the Venn diagram integrated diagnostic analysis model and prognostic analysis models shown in Figure 3e, we acknowledge that ARL4C and ARL13B are the most important markers for diagnosis and prognosis for GC patients among all ARLs.
ARL13B has been previously reported to play a critical role in promoting proliferation, migration and invasion of GC cells and is associated with poor prognosis of GC patients (23). However, the biological functions of ARL4C in gastric tumorigenesis remain unclear. Therefore, we evaluate the protein expression level of ARL4C by IHC in a cohort of 142 GC patients (Cohort ). Higher ARL4C expression is found in primary GC samples compared with normal gastric mucosa tissues (Figure 3f). Furthermore, we identify the ARL4C expression in frozen tumor and adjacent mucosa tissues of 12 GC patients at the Xijing Hospital of Digestive Diseases (Cohort ) by Western blot analysis. The results indicate that the protein expression of ARL4C in the tumor tissues is signi cantly higher than that in the adjacent mucosa tissues (Figure 3g). Meanwhile, ARL4C overexpression could remarkably dampen the prognosis of GC patients after adjusting for several confounding factors, including subtype, Lauren classi cation, stage, age at surgery and gender ( Figure S3d).

ARL4C knockdown decreases the proliferation and metastasis of GC cells in vitro and in vivo as well as reverses EMT
Given that ARL4C is involved in regulating the biological behaviors of various tumors, we examine whether ARL4C acts as an oncogene in GC cells in vitro and in vivo. In contrast to other small G proteins, ARL4C activity is regulated by its expression level rather than the switch between GDP-and GTP-bound status induced by regulators. Thus, we explore the role of ARL4C in the tumorigenesis of GC by constructing ARL4C knockdown GC cells. AGS and MKN45 cells are transfected with shRNA and siRNA against ARL4C. Multiple clones stably transfected with lentivirus are selected and con rmed by PCR and western blot analyses ( Figure S4a and S4b). CCK-8 assays reveal that ARL4C downregulation signi cantly reduces cell growth compared with the control (Figure 4a), which is further con rmed by colony forming assays (Figure 4b). Furthermore, the in vivo analysis shows that silencing ARL4C in MKN45 cells causes obvious reductions in tumor weight and volume in nude mice (Figure 4c). The 3D invasion experiment, as shown in Figure 4d, indicates that ARL4C knockdown can decrease the invasion ability of GC cells in 3D culture. The in vivo metastatic assay also indicates that the downregulation of ARL4C decreases the incidence of lung metastasis and the number of metastatic lung nodules ( Figure  4e). Overall, these results suggest that ARL4C may play a critical role in GC growth and metastasis both in vitro and in vivo.
Epithelial-mesenchymal transition (EMT) is involved in tumor aggressive progression. To con rm the role of ARL4C in regulating EMT of GC cells, we evaluate the expression changes of EMT markers after ARL4C silencing. Western-blot and RT-PCR analyses shos that downregulation of ARL4C leads to the increased expression of E-cadherin and decreased expression of N-cadherin and Vimentin compared with the control group (Figure 5a and 5b). Furthermore, the IF assays show similar results (Figure 5c). Additionally, we investigate the correlation coe cients between ARL4C and EMT markers based on the TCGA data and nd that ARL4C is positively related to Vimentin ( Figure S5c).

ARL4C acts a mediator of TGF-β1/Smad signaling in GC
To uncover the underlying mechanisms of ARL4C in GC, we explore the TCGA database to identify the genes related to ARL4C. As shown in Figure S5a and S5b, we identify that numbers of GC-related genes are highly correlated with ARL4C, among which TGF-β1 is the most signi cant gene in GC (R=0.851, P 0.01). GO and KEGG enrichment analyses indicate that the ARL4C-associated genes (R≥0.5, P 0.05) are signi cantly involved in the cellular response to TGF-β stimulus and TGF-β signaling pathway (Figure 6a and 6b).
As TGF-β1 is identi ed as an important inducer of the malignant progression of cancer, we investigate whether ARL4C might participate in TGF-β1-induced progression of GC. We treat AGS and MKN45 cells with 10 ng/ml TGF-β1 for 24 and 48 h. Following TGF-β1 stimulation, compared with the control, ARL4C is signi cantly upregulated in AGS and MKN45 cells. In particular, TGF-β1-induced ARL4C expression is in a time-dependent manner in AGS cells (Figure 6c). In addition, Western blot analysis and immuno uorescence analysis show the downregulation of ARL4C decreases the expression levels of Smad3, phosphorylated-Smad2 (p-Smad2) and phosphorylated-Smad3 (p-Smad3) in the AGS and MKN45 cells (Figure 6d-6e and Figure S4c). Meanwhile, TCGA correlation analysis shows that ARL4C is positively related to Smad2 and Smad3 with high correlation coe cients ( Figure S5c). These data suggest that ARL4C may mediate the TGF-β1/Smad signaling pathway. Besides, TGFβ-1-induced EMT is reversed when ARL4C is silenced in MKN45 cells (Figure 6f and 6g).

ARL4C enhances the TGF-β1-mediated poor prognosis of GC patients
To translate the above ndings into clinical signi cance, we analyze clinical data of ARL4C and TGF-β1 expression in GC patients from GSE15459 cohort. We divide the samples into 4 groups according to the expression status of ARL4C and TGF-β1: group 1 (ARL4C low / TGF-β1 low ), group 2 (ARL4C high / TGF-β1 low ), group 3 (ARL4C low / TGF-β1 high ) and group 4 (ARL4C high / TGF-β1 high ). Kaplan-Meier analysis shows that elevated expression of TGF-β1 or ARL4C is associated with shorter OS of GC patients. Furthermore, patients with coexpression of TGF-β1 and ARL4C have the lowest OS (Figure 7a).
Next, we construct a predictive nomogram based on overall mortality (OM) via multivariate Cox regression model. The nomogram incorporates six variables: age, the expression status of TGF-β1 and ARL4C, gender, stage, molecular subtype and Lauren subtype. As shown in Figure 7b, to evaluate the individual's probability of overall mortality, values for the prognostic factors must be determined. Each independent prognostic factor is assigned an exact score scale, the points must be added up to obtain the total risk score at 3, 5, and 8 years. The OM probability can be read from the X-axis (total risk score) to the predicted the corresponding probabilities of independent prognostic factors on the left Y-axis.
The nomogram demonstrates that stage of GC patients contributes signi cantly to the individual's probability of overall mortality and patients in stage have the highest mortality. Secondly, the expression status of TGF-β1 and ARL4C is a critical prognostic factor for GC patients. Group 4 (ARL4C high / TGF-β1 high ) has the higher probability of OM than group 1 (ARL4C low / TGF-β1 low ), group 2 (ARL4C high / TGF-β1 low ) and group 3 (ARL4C low / TGF-β1 high ) at 3, 5 and 8 years. We then adopt DCA to verify the prognostic accuracy of the nomogram in OS prediction. The results show that the best net bene t is similar with the prediction of the nomogram at 3, 5 and 8 years (Figure 7c and 7d).
The calibration curves of the nomogram at 3, 5 and 8 years are very close to the best prediction curve, showing a great consistency between the predicted OS rates and the actual observations (Figure 7e).
Taken together, these results suggest that ARL4C is critical for TGF-β1-mediated poor clinical outcomes for GC patients.

Discussion
ARLs have been reported to play important roles in cancer progression. Nevertheless, the exact role of ARLs in GC and the underlying molecular mechanisms are not well illustrated. The present study aims to comprehensively understand the expression patterns, correlation, genetic alteration, diagnostic values, prognostic values and potential functions of ARLs in GC by integrated bioinformatics analysis and experiments.
The expression pro les of ARLs are rstly explored using TCGA, GTEx and Oncomine databases, which demonstrate that ARLs are commonly dysregulated in GC. Accumulating evidences suggest that the cross-talk and collaboration between small GTPase proteins are causatively involved in several cellular processes and diseases (24)(25)(26)(27). In this study, co-expression and correlation analyses demonstrate that the expression levels of ARLs in GC show high correlations. Meanwhile, there are majorities of shared protein domains among ARLs. Furthermore, according to the Hallmark gene sets analysis, we discover that ARLs may modulate numbers of cancer-related pathways, and more interestingly, several ARLs are involved in the same pathways. For instance, ARL5C, ARL10, ARL13B and ARL13A are enriched in P53-PATHWAY which is critical to GC progression (28). Taken together, our bioinformatics analysis indicates that dysregulated ARLs might function synergistically to modulate various signaling pathways in GC.
Our further genetic analysis indicates that the genetic alterations, including copy number alteration and DNA methylation status, are involved in the misregulation of ARLs in GC. Particularly, we nd that DNA methylation status is inversely correlated with the mRNA expression of several ARLs in GC. It is generally accepted that DNA methylation is a major epigenetic process that plays a critical role in different stages of cancer evolution and development (29). Fujii's study showed that the hypomethylation in the 3'-UTR induces the overexpression of ARL4C in lung cancer which contributes to the malignant phenotypes of cancer cells (22). In line with this, we nd that the methylation status at cg24441922 site is signi cantly negatively related to ARL4C mRNA expression in GC. Overall, we acknowledge that DNA methylation status might be involved in the dysregulation and oncogenic functions of ARLs in GC.
Multiple machine learning models are constructed to evaluate the diagnostic and prognostic values of ARLs in GC. After comprehensively analyzing the logistic regression model, univariate Cox regression model, multivariate Cox regression model and LASSO Cox regression model, we rstly reveal that ARL4C and ARL13B are the most critical indicators for diagnosis and prognosis in GC among all ARLs.
Consistent with our results, recent studies have uncovered that overexpression of ARL13B and ARL4C is correlated strongly with the poor prognosis of GC patients (23,30). ARL13B may worsen the survival and stimulate GC cell proliferation and migration both in vitro and in vivo. Meanwhile, it might regulate Smo tra cking and activate the Hedgehog signaling pathway (23). On the other hand, in vitro assays suggest that ARL4C-knockdown would inhibit the migration capacity of GC cells under 2D culture and reduce protein expression of Slug which is related to EMT [20]. However, a previous study has shown the oncogenic function of ARL4C exhibits dramatic differences between in vitro and in vivo assays in colon cancer as well as lung cancer (21). Therefore, we perform 3D and in vivo experiments to further assess the effects of ARL4C on the growth and metastasis of GC cells. Our results solidly support a close association between ARL4C expression and GC malignant phenotypes. Downregulation of ARL4C could obviously inhibit cell proliferation and metastasis both in vitro and in vivo, while ARL4C knockdown has signi cant effects on EMT, as indicated by the increased expression of epithelial marker (E-cadherin) and decreased expression of mesenchymal markers (N-cadherin and vimentin).
TGF-β1, as a pleiotropic cytokine, orchestrates complicated signals to modulate tumorigenesis and promote cancer progression (31). Increasing preclinical and clinical studies have identi ed TGF-β signaling as a determinant in immunotherapy (32). TCGA datamining in our study indicates TGF-β1 as the most signi cant ARL4C-related gene in GC. Further functional enrichment analysis also demonstrates that the ARL4C-associated genes in GC are signi cantly linked to the TGF-β-related signaling. Thus, we speculate that ARL4C could participate in TGF-β1 pathway. Accordingly, the expression of ARL4C is obviously upregulated with the stimulation of TGF-β1 in GC cells, while knockdown of ARL4C weakens p-Smad2/3 expression levels and impairs TGF-β1-induced EMT. These results indicate that ARL4C may act as a mediator between TGF-β1 and Smads in GC. Remarkably, Kaplan-Meier analysis shows that ARL4C (+)/TGF-β1 (+) co-expression is associated with shorter OS of GC patients. To perform prognostic prediction more precisely, we further construct an OS nomogram based on the expression status of ARL4C and TGF-β1 as well as other clinical variables. Consistently, our nomogram indicates that the ARL4C (+)/TGF-β1 (+) group is an independent risk factor and shows the highest mortality at 3, 5 and 8 years. In sum, ARL4C may act a mediator of TGF-β1/Smad signaling and enhance the TGF-β1mediated poor prognosis in GC. Our results demonstrate the great promise of ARL4C targeting treatment in improving the effectiveness of TGF-β1 inhibitors for GC patients.

Conclusion
To our knowledge, it is for the rst time that the expression patterns, genetic alterations, signaling pathways, diagnostic values and prognostic values of ARLs in GC have been fully explored. Our results identify ARL4C as one of the two most signi cant diagnostic and prognostic indicators in GC. ARL4C would function as an oncogene in gastric tumorigenesis by promoting cell proliferation, metastasis and EMT. Furthermore, ARL4C could function as a mediator of TGF-β1 signaling and enhance TGF-β1associated poor prognosis in GC (Fig. 8). Our studies provide an overall insight into the speci c roles of ARLs that bene ts the development of novel strategies for GC detection and treatment.     validating the predictive performance of the nomogram at 3, 5 or 8 years. (e) The calibration curves for the probability of survival show an optimal agreement of the prediction by the nomogram at 3, 5 or 8 years.

Figure 7
A schematic diagram shows that ARL4C functions as a mediator of TGF-β1/Smad signaling.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.