A signature of Ten Chromodomain Helicase DNA (CHD) Genes Related LncRNAs Predicts Overall Survival in Gastric Cancer Patients

Background: Gastric cancer (GC) is one of the most common tumors in the world and is often found at an advanced stage. Gene mutations and changes in lncRNA expression often led to the occurrence of GC. In this study, we explored the function of Chromodomain Helicase DNA (CHD) genes and a multi-gene prognostic model was established for its co-expressed lncRNA. Methods: We used the "Limma" package of R software to analyze the expression differences of CHDs genes. Use Kaplan-Meier curve to plot the relationship between risk score and patient survival time. Receiver operating characteristic (ROC) curve was used to describe the reliability of the model. The data used is obtained from The Cancer Genome Atlas (TCGA) database and CellMiner database. Results: We found that CHDs gene mutations had signicant statistical differences with tumor mutation load and were involved in biological functions such as DNA winding and transcription. 10 CHDs gene-related lncRNAs were used to establish a prognosis model for GC patients. Univariate and multivariate Cox regression analysis proved that risk score could be an independent prognostic factor. Conclusion: In our study, we developed a novel prognostic model using 10 CHDs-related lncRNAs to predict individualized survival in patients with GC.


Introduction
The latest statistics show that gastric cancer is the fth most common cancer worldwide (5.6%) and the fourth most common cause of cancer death (7.7%) (1). GC patients usually have a poor prognosis, once it is found that patients are usually at an advanced stage and have a high rate of chemotherapy resistance, so it is necessary to determine new therapeutic targets. Although Long non-coding RNAs (lncRNAs) lack the ability to directly encode proteins, they can regulate cell physiology and function through complex epigenetic mechanisms (2).In recent years, with the in-depth studies of lncRNA, it has been proved that lncRNA is widely involved in the growth, metastasis, and invasion of GC (3,4). LncRNA also plays a role in the regulation of gene expression and autophagy, and these dysfunctions can promote human tumorigenesis (5). Therefore, it is necessary to infer the prognosis of patients from lncRNA and other biological information.
Chromodomain Helicase DNA (CHD) binding protein family, consisting of CHD1-CHD9 genes, is characterized by the chromatin domain and SNF2-associated helicase /ATPase domain of its proteins.
CHDs gene can modify chromatin structure to change gene expression thus altering access to the transcriptional apparatus to its Chromosomal DNA template. In recent years, the research on CHDs gene family in GC has been paid more and more attention, which has brought interest and provided some references for us to study CHDs gene family. CHDs is an important regulator of chromatin remodeling, frameshift mutations and deletion of CHDs gene expression are often seen in GC detection, and these changes usually promote tumor development (6).Some researchers examined the serum of patients with GC and found that CHD1 was highly methylated in the serum, the higher methylation level indicating a worse prognosis (7). It has been reported that CHD5 promoter was methylated in 7 different GC cell lines and was identi ed as a direct target of miR-454. The low expression of CHD5 promoted tumor growth and had a worse survival time (8-10), compared with the normal tissue CHD8 lower the expression in GC, and can promote tumor cell proliferation (11).
At present, imbalance of lncRNA levels has been reported for each type of cancer, and the role of lncRNA in tumor development has been newly understood, which is expected to become the next type of diagnostic and therapeutic tool in oncology (12).

Differentially expressed CHDs, Tumor mutation load and enrichment analysis
We rst used the "Limma" package of R software(version 3.6.3) to analyze the differential expression of CHDs gene in normal samples and tumor samples, and to analyze the tumor mutation load (13). Then we conducted Gene Ontology (GO) enrichment analysis on the potential molecular mechanism of CHDs. The "Corrplot" package of R software was used to investigate the correlation between CHDs genes.

Selection and expression of OS-related lncRNAs
Select the lncRNAs related to CHDs, we still used the "Limma" package of R software to extract the expression data of the lncRNAs, and then used the "igraph" package to draw the node diagram of the lncRNAs and CHDs genes and used the heat map and box diagram to display the expression amount of these lncRNAs. "Survival" package and Univariate Cox regression analysis were used to make forest maps to show OS-related lncRNAs.

Establishment of the prognostic signature
To improve the accuracy of the prediction model established by CHDs-related lncRNAs, we divided the data of GC patients in TCGA into two groups according to 6.5 : 3.5 and used multiple data to establish the prognosis model. Lasso regression was used to nd 10 lncRNAs that could be modeled, and the correlation coe cient corresponding to each lncRNA was calculated. For each patient: Risk score = expression level of lncRNA1 × β1 + expression level of lncRNA2 × β2 +…+ expression level of lncRNA10 × β10; Where β is the coe cient induced by the multivariate Cox regression model (14).
Subsequently, patients were divided into high-risk group and low-risk group according to the median risk score, and the "SurvivalRoc" package of R software was used to construct time-dependent receiver operating characteristic (ROC) curve, to evaluate the predictive value of the model. Kaplan-Meier survival curves were used to show the difference in survival between the high-risk and low-risk groups, and univariate and multivariate Cox regression analyses were performed to demonstrate that risk score could be used as an independent prognostic factor to predict survival.

Veri cation of the prognostic signature
We divided the patients into two groups according to age, sex, stage, and other characteristics. Using the same predictive model, a risk score was calculated for each patient, and then Kaplan-Meier curves were plotted for the number of people in the high-risk and low-risk groups to show how patients survived in the different groups. Next, we used the data of GC patients who were not used to establish the model to draw the survival curve, ROC curve and risk curve respectively according to the same calculation method and used univariate and multivariate Cox regression analysis to further verify the reliability of the model.

Drug sensitivity analysis
We obtained the RNA-seq and drug activity data of CHDs gene from the CellMiner database, obtained the R package we were going to use from the Bioconductor website to extract the data, and analyzed the correlation of the data by Pearson method to explore the sensitivity of CHDs gene expression to various anti-tumor drugs.

Results
3.1 Tumor mutation load, enrichment analysis and differential expression of CHDs.
We rst downloaded RNA-seq data from the TCGA database, including 375 tumor samples and 32 normal samples. Then we processed the data and analyzed the expression levels of CHD1-9 in the samples by using the "limma" package of R software. Except for CHD3 and CHD9, the rest of the CHD genes showed signi cant statistical differences (Fig. 1A). Mutation of tumor genes is the key to the occurrence of tumors. So we analyzed the tumor mutation load, when the CHDs gene was mutated, the tumor mutation load of samples in the mutant group increased to a large extent (Fig. 1B), which means that the CHDs gene may play a promoting role in the mutation of tumor genes.
In addition, to further explore the potential role of CHDs in GC patients, GO analysis was conducted in the R software. As shown in (Fig. 2A), the results of GO enrichment analysis showed that these CHDs genes play an important role in DNA winding, including DNA duplex unwinding, DNA geometric change, DNA conformation change, DNA helicase activity, helicase activity, and catalytic activity acting on DNA. Coexpression analysis also showed a high co-expression relationship among CHDs genes, such as CHD2 and CHD6, CHD8 and CHD4, with correlation coe cients of 0.52 and 0.53, respectively (Fig. 2B).

Identi cation of OS-related lncRNA of CHDs and its expression.
We extracted lncRNAs co-expressed with CHDs genes by R software and obtained a total of 536 lncRNAs associated with CHDs genes (Fig. 3A). In addition, according to the results of univariate Cox regression analysis, a total of 13 differentially expressed lncRNAs were signi cantly correlated with OS in GC patients (Fig. 3B). Compared with normal tissues, the expressions of these lncRNAs were statistically different in tumors and the results were shown by box diagram (Fig. 4A) and heat map (Fig. 4B).

Establishment of the ten-LncRNAs risk signature.
Firstly, all the samples were randomly divided into 2 groups and the prognostic model of GC was established by using one group of data of 13 OS-related lncRNAs. Lasso Cox regression analysis was used to select 10 lncRNAs that could be used to establish a prognostic model (Fig. 5A,B), and the coe cients of each lncRNA were obtained (Fig. 5C).Then, according to the coe cient determined by Lasso Cox regression and multiplied by the corresponding lncRNA expression level, the weighted sum was the risk value of each patient. Patients were divided into high-risk and low-risk groups according to the median of the risk value. Figure 6 shows the changing trend of the number of deaths in patients with GC with the increase of risk value. With the increase of risk value, the red points in the gure increase, indicating the increase of the number of deaths (Fig. 6A,B). Univariate Cox regression analysis showed a signi cant correlation between risk score and survival in patients with GC (P < 0.001, HR = 1,130,95%CI 1.057-1.209) (Fig. 6E).
Multivariate Cox regression analysis con rmed that risk score was an independent prognostic indicator (P = 0.003,HR = 1.113,95%CI 1.038-1.194) (Fig. 6F).The Kaplan-Meier curve shows the difference in survival between patients with high and low risk scores, with a higher score indicating a shorter survival time (Fig. 6G). Then we use the "SurvivalRoc" package of R software to plot ROC curve, the areas under the curve (AUCs) for the risk score at 3-year in predicting OS were 0.805. In conclusion, risk score has a relatively good performance in predicting OS in TCGA dataset.
3.4 Validation of the ten-LncRNAs prognostic model.
Subsequently, to further verify the reliability of the model we established, GC patients in the TCGA database were divided into different groups according to different indicators. Such as by age into less than or equal to 65 and more than 65 in the two groups (Fig. 7A), according to the gender is divided into two groups of men and women (Fig. 7B), according to the stages of early phase and later period is also divided into two groups (Fig. 7C), next we use the same model formula to calculate the group in their respective risk scores, found in different groups, get high risk score still has a shorter survival time of GC patients, which is consistent with our previous analysis results.
At the same time, we also used the data of another group of TCGA GC patients other than the established model to further verify the validity of the model by using the same computational model. As shown in Fig. 8, the number of deaths in GC patients increased along with the increase in risk worthiness (Figs. 8A,B,C,G). Univariate and multivariate Cox regression analysis also proved that risk score could be used as a predictor to predict the survival of GC patients alone (Figs. 8E,F), and the area under the ROC curve AUC = 0.777 also indicated that the model previously established by us had good reliability (Fig. 8D).

Drug sensitivity analysis of CHD genes.
We jointly analyzed the transcriptional expression of CHDs genes in NCI-60 cancer cell lines and the drug activity of 263 antitumor drugs in the CellMiner database and used Pearson correlation analysis to explore the potential in uence of CHDs gene family on drug response, the results are shown in the Fig. 9.
Notably, there was a positive correlation between CHD1 and the sensitivity to Nelarabine (correlation

Discussion
In recent years, the incidence and mortality of GC have decreased with the improvement of treatment methods (15). However, the diagnosis and treatment of GC still face great challenges, especially the survival of patients. We cannot accurately predict the survival time only with the current staging system (16). The development of tumors depends on mutations in multiple genes, which means that models composed of multiple genes may be better predictors of patient prognosis than single indicators.6 Random errors often occur in the process of DNA replication, and the mismatched DNA will be repaired in time by the body's repair system. When this dynamic balance is broken, gene mutations will often occur. Such replication errors are also affected by external environment, such as radiation, smoking, diet, etc. When the mutated genes accumulate gradually or the key parts of the genes have missense mutations, the growth of cells will begin to be not regulated by the body normally, and the division and proliferation of cells will change, which means the beginning of cancer. CHDs gene family is a family of genes closely related to DNA duplex unwinding, DNA geometric change, DNA conformation change, DNA helicase activity and other biological functions. Studies have found that the occurrence of GC is often accompanied by changes in CHDs gene family. In other types of cancer have similar reports, such as high expression of CHD4 will promote colorectal cancer cell proliferation, invasion and metastasis (17), which were also determined for regulating the proliferation and migration of breast cancer oncogene (18, 19).There also have studies reported mutations of CHD4 through TGF-β signal pathway in promoting endometrial cancer (20). The loss of CHD1 increases the risk of postoperative metastasis of prostate cancer (21), and the expression of CHD5 and CHD9 may be independent biomarkers for prognosis of colorectal cancer (22,23). Another study reported that patients with high expression of CHD9 had a worse prognosis compared with patients with low expression of CHD9 (24).This attracts us to use TCGA and other public databases for further analysis of CHDs gene, so as to provide new reference for future clinical treatment.
Tumor mutation load refers to the number of somatic mutations removed from the germ line of the tumor genome. Theoretically, higher TMB would also produce more neoantigens, and targeted immunotherapy might be more effective (25).Defective DNA mismatch repair will lead to increased mutation load, and the occurrence of GC and drug resistance are related to this(26, 27). Genomic instability is closely associated with defective repair of DNA damage, and CHD4, which is involved in chromatin relaxation, may affect DNA repair when mutated(28). CDH1 is also involved in chromatin repair, and the loss of CHD1 leads to chromatin dysregulation (29). In addition, CHD1 has been reported to promote DNA damage repair in prostate epithelial cells (30).We analyzed the relationship between CHDs genes and tumor mutation load and showed that mutations in CHDs directly contributed to the increase in TMB, which may be related to the involvement of CHDs genes in DNA duplex unwinding and other biological functions.
With the development of gene research, it has been found that the regulation of protein synthesis is not only the unique function of coding genes. In addition to the role of genetic information carrier, RNA also plays a variety of regulatory functions. LncRNA also plays a complex and precise regulatory role in body development, gene expression, transcriptional activation, transcriptional interference, nuclear transport and other regulatory processes, which have attracted extensive attention. With the further study of lncRNA, the changes of lncRNA transcriptome are expected to become a new indicator in tumor diagnosis and become a new diagnostic and therapeutic tool (12). In this study, CHDs gene-related lncRNAs were studied. Firstly, OS-related lncRNAs were extracted, and a prognostic model based on these lncRNAs was established by Lasso regression method to predict the survival of GC patients. Secondly, ROC curve was used to represent the reliability of the model. Risk curve and survival curve were used to show the survival status of GC patients with different risks scores. Univariate and multivariate Cox regressions analysis were used to test whether risk score could be used as an independent prognostic indicator to predict patient survival. Next, we divided the patients into different groups and used the remaining TCGA data to test the correctness of the model. The results showed good performance, with the increase of risk score, the mortality rate of the patients also increased.
However, our current study still has several limitations to consider. First, our patient data comes from the TCGA database, and the sample size is relatively small with regional distribution. Therefore, we need more data to verify the reliability of the model. Secondly, we need more and larger research centers to verify the value of CHDs-related lncRNA model. Thirdly, we have not yet studied the mechanism of these lncRNAs, and more experiments are needed to clarify the molecular mechanism.

Conclusion
In conclusion, a reliable prognostic model for GC patients was established in this study to predict the OS of GC patients.

Declarations Acknowledgement
Thanks to Xin Xu, Youliang Wu for guiding the format modi cation and submission of the magazine.

Statement of Ethics
All analyses were based on Public database; thus, no ethical approval and patient consent are required.

Con ict of Interest Statement
The authors declare no con ict of interest.

Consent for publication
Not applicable

Funding Sources
This work was supported by a grant from the National Natural Science Foundation of China (81874063).

Authors' Contributions
Xiaodong Wang collects all the article data and is responsible for writing the full text. Yaxian Li participated in the writing of the article and the modi cation of the article format. Yida Lu was responsible for the editing of the pictures and participated in the writing of the full text. Yongxiang Li provided the ideas for the research and all the funding. All authors read and approved the nal manuscript.

Availability of data and materials
The data included in the current study are available in the TCGA database (https://cancergenome.nih.gov/) and the CellMiner database (https://discover.nci.nih.gov/cellminer/home.do).    The prognostic model was established using TCGA database and veri ed its prognostic value. A-B: risk curve shows the survival status of high-risk and low-risk patients; C: heat map shows the expression of lncRNAs in the high and low risk groups; D: receiver operating characteristic (ROC) curve for patients with higher and lower risk score (P <0.001, AUC = 0.805). E-F: univariate, and multivariate Cox regression analysis tested the reliability of risk score; G: Kaplan -Meier curves for patients with higher and lower risk score.    Drug response analysis. Correlation between TCGA tumor drug sensitivity and CHDs, Scatter plot sorted by P value.