Integrated Bioinformatics Data Analysis and Experimental Studies Reveals Prognostic Signicance of ALDH Family in Endometria Cancer


 Background

Uterine Corpus Endometrial Cancer (UCEC) is one of the three common malignant tumors of the female reproductive tract. According to reports, the cure rate of early UCEC can reach 95%. Therefore, the development of prognostic markers will help UCEC patients to find the disease earlier and develop treatment earlier. The ALDH family was first discovered to be the essential gene of the ethanol metabolism pathway in the body. Recent studies have shown that ALDH can participate in the regulation of cancer.
Methods

We used the gene profile data of 33 cancers in the TCGA database to analyze the expression and survival of the ALDH family. GO, KEGG, PPI multiple functional analysis was used to predict the regulatory role of ALDH family in cancer. In addition, using CCK-8, colony formation, nude mouse tumor formation and other methods, the in vitro function of UCEC cancer cell lines was tested to further confirm the key role of ALDH2 expression in the proliferation of UCEC cell lines. Finally, Lasso and Cox regression methods were used to establish an overall survival prognosis model based on ALDH2 expression.
Result

In our research, we explored the expression of ALDH family in 33 cancers. It was found that ALDH2 was abnormally expressed in UCEC. Besides, in vivo and in vitro experiments were conducted to explore the effect of ALDH2 expression on the proliferation of UCEC cell lines. Meanwhile, the change of its expression is not due to gene mutations, but is regulated by miR-135-3p. At the same time, the impact of ALDH2 changes on the survival of UCEC patients is deeply discussed. Finally, a nomogram for predicting survival was constructed, with a C-index of 0.798 and AUC of 0.764.
Conclusion

This study suggests that ALDH2 may play a crucial role in UCEC progression and has the potential as a prognostic biomarker of UCEC.


Abstract
Background Uterine Corpus Endometrial Cancer (UCEC) is one of the three common malignant tumors of the female reproductive tract. According to reports, the cure rate of early UCEC can reach 95%. Therefore, the development of prognostic markers will help UCEC patients to nd the disease earlier and develop treatment earlier. The ALDH family was rst discovered to be the essential gene of the ethanol metabolism pathway in the body. Recent studies have shown that ALDH can participate in the regulation of cancer.

Methods
We used the gene pro le data of 33 cancers in the TCGA database to analyze the expression and survival of the ALDH family. GO, KEGG, PPI multiple functional analysis was used to predict the regulatory role of ALDH family in cancer. In addition, using CCK-8, colony formation, nude mouse tumor formation and other methods, the in vitro function of UCEC cancer cell lines was tested to further con rm the key role of ALDH2 expression in the proliferation of UCEC cell lines. Finally, Lasso and Cox regression methods were used to establish an overall survival prognosis model based on ALDH2 expression.

Result
In our research, we explored the expression of ALDH family in 33 cancers. It was found that ALDH2 was abnormally expressed in UCEC. Besides, in vivo and in vitro experiments were conducted to explore the effect of ALDH2 expression on the proliferation of UCEC cell lines. Meanwhile, the change of its expression is not due to gene mutations, but is regulated by miR-135-3p. At the same time, the impact of ALDH2 changes on the survival of UCEC patients is deeply discussed. Finally, a nomogram for predicting survival was constructed, with a C-index of 0.798 and AUC of 0.764.

Conclusion
This study suggests that ALDH2 may play a crucial role in UCEC progression and has the potential as a prognostic biomarker of UCEC.

Background
Gynecological cancers include ovarian cancer, endometrial cancer, vaginal cancer, cervical cancer and vulvar cancer (1). EC is a common gynecological malignant tumor. According to statistics, in 2018, there were more than 380,000 new cases worldwide (2). In developed countries, endometrial cancer is the most common gynecological cancer. There are more than 60,000 con rmed cases, more than 10,000 deaths, and the death rate is increasing year by year (3). However, in recent decades, studies have found that early screening and intervention can signi cantly reduce the incidence and mortality of endometrial cancer (4-6). The above data shows that early screening of patients with a high risk of endometrial cancer is essential, and the discovery of early diagnosis genes will help prevent the occurrence and development of these diseases.
In traditional research, the discovery of new diagnostic markers and prognostic factors through sequencing has the disadvantages of a long time and high cost. The emergence of bioinformatics brings us new research methods for the rapid discovery of new markers and prognostic factors. Recently, more and more studies have shown that bioinformatics can be used as a reliable tool for cancer research [7][8][9]. In lung cancer, Zhang uses TCGA to conduct a comprehensive analysis of the prognosis, immune function and immune markers based on TNF family markers and developed a risk survival model based on TCGA samples (7). Similarly, some studies have found that not all TP53 mutations (mainly referring to missense and nonsense mutations) with the help of bioinformatics can effectively predict the therapeutic effect of patients with lung adenocarcinoma (LUAD) on ICIs (8). However, Li used multiple sets of bioinformatics in the public database to study the expression characteristics, prognostic value, immune in ltration pattern and biological function of Siglec-15, and veri ed it in patients (9). In endometrial carcinoma, Wang developed a six-gene prognostic model to predict overall survival (10). These research results show that exploring new cancer markers and developing new prognostic tools through public databases is a reliable research method.
The human acetaldehyde dehydrogenase family (ALDH family) has 19 members, including ALDH1, ALDH3, ALDH2, etc. ALDH2 has become a hot spot in current research due to the importance of its function. They were rst discovered to play a critical role in the oxidation of ethanol in humans (11). The alcohol dehydrogenase in the liver cytoplasm rst metabolizes alcohol to acetaldehyde, and then the acetaldehyde dehydrogenase in the mitochondria gradually metabolizes acetaldehyde into acetic acid. Finally, it is completely decomposed into water and carbon dioxide in the peripheral circulation. However, more and more recent studies have shown that the ALDH family plays an essential role in the occurrence and development of cancer. In gastric cancer, ALDH1 can be used as a tumor stem cell marker (12,13). Moreover, the positive expression of ALDH1A1 is associated with low overall survival time and progression-free survival time (14).
In breast cancer, citral reduces the growth of breast cancer tumors by inhibiting the breast stem cell marker ALDH1A3 (15). However, in prostate cancer, contrary to tumor tissues, the expression of matrix ALDH1 improves clinical outcome, and is less frequent in PCa metastases (16). However, there is no report on the relationship between ALDH expression and clinical in Uterine Corpus Endometrial Carcinoma.
In this study, we comprehensively explained the prognostic signi cance of ALDH2 in UCEC patients. At the same time, a new 6-factor prognostic risk scoring model was developed based on the expression of ALDH2. Furthermore, the effect of ALDH2 expression on the proliferation ability of UCEC was preliminarily discussed. And the regulation pathway of ALDH2.. These ndings can provide theoretical support for UCEC patients to develop new personalized treatments in the future.

Data collection
Expression data and clinical annotation data of 33 types of cancer patients were downloaded from the TCGA data portal (tcga-data.nci.nih.gov/tcga). Perl software (Version 5.3.2) was used in the 33 types of cancer to sort the data to analyze the data for each patient. The expression pro le data of normal human endometrial cancer is obtained from the GTEx database (http://www.gtexportal.org). All databases data were accessed in September 2020.
Analysis of overall survival R with survival and surviminer packages used the Kaplan-Meier method and Cox regression to analyze survival between different groups. P-value of less than 0.05 is considered statistically signi cant.
Analysis of different genes R software with limma package was used to analyze the different genes between the data of each group, and log2 foldchange > 0.5 and adjust P value < 0.05 was used as the standard.

Analysis of gene and pathway enrichment
In order to determine the biological processes, cell components, molecular functions and biological pathways of differential gene enrichment, r software with clusterPro ler and org.Hs.eg.db packages was used for gene ontology (GO) enrichment and Kyoto Gene and Genome Encyclopedia The whole book (KEGG) approach analysis (17)(18)(19). At the same time, Gene Set Enrichment Analysis (GESA) was used to analyze related functions among different groups. The screening criterion is P < 0.05.

Construction of protein-protein interaction (PPI) network
The STRING database was used to construct a protein interaction network of differential genes (20). The screening criterion was a comprehensive score of ≥ 0.9. Cytoscape software with Cytohubba and Mcode apps was used to visualize and screen highly critical genes and core sub-networks.

Model construction and veri cation
In order to construct the prognostic risk scoring system for UCSC, univariate Cox regression analysis was used to determine the prognostic gene of DEGs. P < 0.05 gene was considered as signi cant. Then, Lasso penalized Cox regression analysis was used to further select OS-related prognostic genes in UCEC. Finally, UCEC patients in the TCGA database were randomly divided into training sets and validation sets with equal numbers, and univariate Cox analysis was used to regress to establish a risk scoring model gradually. The Cindex, AUC value of the model, is calculated by the validation set and the veri cation of the Nomo diagram to test the accuracy of the risk score model.

Analysis of intratumoral immune cell composition
CIBERSORT deconvolution algorithm (https://cibersort.stanford.edu/) was used to estimate the abundance of 22 immune cell types in UCEC and to evaluate the intratumoral immune cell composition.

Cell culture
Three different endometrial cancer cell lines and 293t were purchased from ATCC (ATCC, USA). The highly differentiated endometrial cancer cell line ISHIKAWA and the moderately differentiated endometrial cancer cell line SPEC-2 were cultured with MEM medium (GIBICO, USA) containing 10% fetal bovine serum (BI, Australia), The poorly differentiated endometrial cancer cell line KLE was cultured in DMEM/F12 (GIBICO, USA) mixed medium containing 10% fetal bovine serum. The 293T cell line and endometrial epithelial cell line hEEC were cultured in DMEM (GIBICO, USA) mixed medium containing 10% fetal bovine serum. All cells are cultured in a 37° incubator containing 5% CO2.

CCK-8 assay
Inoculate the cell suspension in a 96-well plate and continue to culture for 12h, 24h, 48h, 72h, and then use the Cell Counting Kit 8 (CCK-8) Kit (Dojindo, Japan) to measure cell proliferation ability. The absorbance values at OD 490 nm were measured using a plate reader (Biorad, USA).

Colony formation assay
The cells were seeded in a 6-well plate, and when they were cultured normally to form visible clones, the cells were xed with 4% formaldehyde, stained with Kimsa's solution (Meilunbio, China) and photographed for analysis.

Xenograft tumor in nude mice
Nude mice were randomly divided into Control group and ALDH2 overexpression group. Nude mice in each group were subcutaneously injected with 0.2 ml of UCEC cell suspension (1×10 7 cells/ml) on the ventral side of the right hind limb. The mice were fed with a normal diet under SPF conditions, and the subcutaneous tumor formation of nude mice was observed. On the 21th day, the nude mice were sacri ced by cervical dislocation, the subcutaneous tumors were completely removed, and weighed with an electronic balance.
The Wuhan University of Science and Technology Animal Ethics and Use Committee approved the tumorforming experiment in nude mice.

Dual-luciferase reporter assay
Luc-3'-UTR of ALDH2 and mutant form were separately subcloned into a pmirGLO (Addgene, USA) vector to establish wt-ALDH2-luc and mut-ALDH2-luc respectively. The miRNA mimic, internal control and Luc plasmid were co-transfected into cells. Then, luciferase activities were detected 48 h after transfection by the Dual-Luciferase Reporter Assay System (Promega, USA).

RNA Pull-Down
RNA pull-down assay was carried out using Magnetic RNA-Protein Pull-Down Kit (Thermo, USA). The RNAbound beads were added in the cell nuclear lysate. Then, the eluted proteins were detected by western blot analysis.

RNA Immunoprecipitation (RIP)
RNA-protein-antibody complexes were captured using Protein A/G (Thermo, USA). RNA was eluted by adding TRIzol directly to magnetic beads and isolated as per the manufacturer's instructions. cDNA was synthesized using HiScript® II 1st Strand cDNA Synthesis Kit (Vazyme, China) and analyzed by qRT-PCR.
RNA isolation and qRT-PCR RNeasy Plus Universal Mini Kit (QIAGEN, USA) was used to isolate total RNA from cell lines, and HiScript® II 1st Strand cDNA Synthesis Kit (Vazyme, China) was used to reverse transcribe into cRNA. At the same time, the miRNeasy Micro Kit (QIAGEN, USA) was used to isolate miRNA from cell lines, and the miRNA 1st Strand cDNA Synthesis Kit (Vazyme, China) was used to reverse transcription of miRNA. For the extracted RNA and miRNA, qRT-PCR was performed on the Bio-Rad CFX-96 (Biorad, USA) system using the SYBR Green (Yisen, USA) method to determine the relative expression level. β-actin and U6 were used As an endogenous control. The primer sequences of ALDH2 are as follows: ALDH2-F: 5-ATGGCAAGCCCTATGTCATCT-3, ALDH2-R: 5-CCGTGGTACTTATCAGCCCA-3. The synthesis of primers, plasmid sequencing, miRNA reverse transcription sequences, primers and probes are all designed by Ribobio (Ribobio, China).

Western blot
Western blot assay was performed according to the standard protocol. The ALDH2 antibody was purchased from CST (CST, USA), and the antibody was diluted according to the recommended ratio in the instructions

Statistical Analyses
Comparisons between groups and normality were performed using R 3.6.3.software. Comparisons were completed using a one-way analysis of variance (ANOVA), two-tailed Student's t-test, non-parametric tests.
Kaplan-Meier analyses with log-rank tests and the Cox proportional hazard model were used to analysis for survival. All data are presented as the mean ± SD. Statistical signi cance was set at p < 0.05.

The expression of ALDH family in all cancers of TCGA.
Through the TCGA database, the expression data of all ALDH families in all cancers are obtained. As shown in Figure 1, the expression genes of the ALDH family genes in cancer are very different (Figure 1.A), and the expressions of the ALDH family genes are highly correlated (Figure 1.B). In detail, ALDH9H1, ALDH18A1, ALDH3A2, ALDH1A1, ALDH1B1 and ALDH2 are expressed higher than other ALDH family genes in all cancers. The expression of ALDH2 and ALDH8A1 is highly positively correlated. Further analysis of the rst six ALDH family genes that are highly expressed in cancer. According to the classi cation of normal tissues and tumor tissues, it was found that the expression of ALDH3B1, ALDH18A1, etc., increased in a variety of cancers, but ALDH9A1 and ALDH2 showed the opposite trend ( 2. The prognostic signi cance of ALDH family. To further con rm the effect of ALDH family genes that are highly expressed in cancer on UCEC patients. We retrieved the survival information of each patient in the TCGA database, combined with the expression analysis of ALDH family genes. Changes in other ALDH family genes do not affect the overall survival of UCEC patients. Only changes in the expression of ALDH2 gene will signi cantly affect the overall survival of UCEC patients (Figure 2.A). The results showed that the overall survival of the low expression group of ALDH2 and ALDH18A1 was low. At the same time, the survival of DSS, DFI, and PFI of the low expression group of ALDH2 was the same as the overall survival result, all showing a worse survival rate (Figure 2.B).
For DSS in the ALDH18A1 low expression group, DFI and PFI were not different from the high expression group (Figure 2 3. Analysis of GO, KEGG, GSEA function enrichment. In this study, we downloaded the expression data of all UCEC samples in the TCGA database. The deletion was divided into 2 groups according to the median value and the expression of ALDH2, and the different genes between the groups were analyzed. A total of 477 differential genes were identi ed. Among them, 322 genes were up-regulated, and 155 genes were down-regulated (Figure 3.A, B). In order to understand the possible regulatory functions of these differential genes. The six pathways with the highest correlation found in GO enrichment are (  (Figure 3.F). The above enrichment results indicate that the alteration of ALDH2 expression may be related to cellular immunity. Therefore, the immune in ltration score of each UCEC patient was calculated by R software with cibersort package. The results showed that patients in the ALDH2 low expression group had lower scores for CD8 + T cells and plasma cells (Figure 3.G).

PPI network construction.
PPI of DEGs was constructed by using the String network tool. After hiding all the individual nodes, it is found that there are 392 nodes and 295 edges (Figure 4.A). Statistics found that the rst six genes with the number of interaction relationships are ORM2, C3, CTSH, HLA-DPA1, HLA-DPB1, HLA-DQA1 (Figure 4.B). At the same time, import the results into Cytospace software. Use MCODE and Cytohubba apps to calculate the core subnet (Figure 4.C) and core gene (Figure 4.D) in the PPI network. The result is shown in the gure. 5. The risk score model of UCEC construction.
Our data show that the high expression ALDH2 group has a better OS, DSS, and PFI( Figure 2). Therefore, a prognostic model based on the overall survival score of ALDH2 expression was constructed. The expression pro le of UCEC in the TCGA database was randomly divided into an equal number of training groups and test groups. COX univariate results showed that a total of 62 genes were signi cantly related to the overall survival of UCEC patients. Further use lasso regression to screen out 15 genes to prevent the model from over tting ( Figure 5.A, B). Finally, a 6-factor overall prognostic rishk score model was established using COX stepwise regression. The coef of rishk score model of overall survival was:

Effect Of ALDH2 Overexpression On Tumor Progression In Vitro And In Vivo.
To determine whether ALDH2 expression is reduced in endometrial cancer cell lines. It was detected by Western Blot and qRT-PCR. Compared with normal human endometrial epithelial cells hEEC, the expression of ALDH2 is reduced in ISHIKAWA, SPEC-2, and KLE ( Figure 6.A, B). At the same time, we constructed a lentiviral plasmid pLKO.1-ALDH2 that overexpressed ALDH2. The virus was collected after transfection of 293T cells. After being concentrated by PEG8000, it infects ISHIKAWA and KLE cell lines. The overexpression e ciency was veri ed by Western Blot and qRT-PCR (Figure 6.C, D). Next, the relationship between the expression of ALDH2 and the proliferation ability of endometrial cancer was veri ed by CCK-8 and colony formation analysis (Figure 6.E, F). The results showed that restoring the overexpression of ALDH2 reduces the proliferation ability of endometrial cancer cell lines. In addition, the tumor xenograft model is used to determine whether overexpression of ALDH2 has the same function. The results showed that the tumor mass and volume of mice in the overexpression group ALDH2 group were signi cantly smaller than those of the control group (Figure 6.G).
The change of ALDH2 plays a crucial role in the survival and immune in ltration of UCEC patients.
Therefore, there is an urgent need to nd a way to regulate ALDH2. Analysis of gene mutation frequency found that in most patients, it is not the mutation that caused the effect of ALDH2 to disappear (Figure 7.A,   B).
Numerous studies have found that miRNA can silence gene expression by targeting gene 3UTR. We speculate that ALDH2 may also be regulated by this mechanism. Screen the difference miRNA between UCEC patients and the normal group. The volcano plot showed that a total of 122 up-regulated miRNAs were screened (Figure 7. C). At the same time, visit the bioinformatics prediction website TargetScan. As a result, 312 miRNAs were predicted to bind to ALDH2's 3UTR. A total of 10 miRNAs were selected from the intersection, and the correlation between their expression and ALDH2 expression is shown in the gure ( Figure 7D, E). The top three miRNAs with the highest negative correlation are hsa-miR-301b-5p, hsa-miR-3187-3p and hsa-miR-135b-3p.
To verify which miRNA regulates the expression of ALDH2, we constructed the Luciferase plasmid of ALDH2 3'UTR. The dual uorescein report experiment results showed that only the hsa-miR-135b-3p transfection group had a decrease in relative uorescence intensity. However, there was no signi cant difference between hsa-miR-301b-5p and hsa-miR-3187-3p after transfection ( Figure 7F). The mutation group showed that the mutation could eliminate the inhibitory effect of hsa-miR-135b-3p on ALDH2 ( Figure 7G). At the same time, RIP and RNA pull-down found that hsa-miR-135b-3p can enrich the expression of ALDH2, but the control group did not have this phenomenon ( Figure 7H, I). qRT-PCR showed that after transfection of hsa-miR-135-3p mimic, the expression of hsa-miR-135b-3p increased signi cantly ( Figure 7J), while the results of WB and qRT-PCR showed that the expression of ALDH2 decreased ( Figure 7K, L). Besides, TCGA survival data showed that the high expression group of hsa-miR-135b-3p showed a lower prognosis ( Figure 7M). These results all indicate that hsa-miR-135b-3p can down-regulate the expression of ALDH2.

Discussion
Endometrial cancer is one of the three most common malignant tumors of the female reproductive tract, and it is the sixth most common cancer in women worldwide (2). With the increase in the average life expectancy of the population and the change in living habits, the incidence of EC has continued to rise and become younger in the past two decades (21). Although the cause of endometrial cancer is not very clear so far, it may be due to genetics, obesity, or the use of drugs (22). However, there are clear results showing that the cure rate of early UCEC can reach 95% (23) Therefore, the development of prognostic markers will help UCEC patients to nd the disease earlier and develop treatment earlier. However, due to traditional experimental methods, the time period is long, and the research and development costs are high. The emergence of bioinformatics has brought us the possibility of studying new prognostic markers. A number of research results have shown that bioinformatics is a reliable research tool. In gastric cancer, GDF-15 can be used as a biomarker for gastric cancer (24). In liver cancer, YTHDF1 expression is elevated in liver cancer patients (25). In UCEC, Li found that Mammaglobin B is a prognostic marker of UCEC (26). Similarly, we downloaded the expression data of 33 cancers in TCGA.
Extracting ALDH family genes found that the expression of ALDH family has changed signi cantly in most cancers. The ALDH family was rst discovered to be mainly involved in the process of alcohol metabolism (27). However, recent studies have found that ALDH family genes can participate in the regulation of cancer.
In detail, ALDH1 has been identi ed as a tumor stem cell marker involved in the development of cancer (28).
And ALDH2, Li found that restoring the expression of ALDH2 in lung adenocarcinoma can inhibit the migration of lung cancer cells (29). This is consistent with our ndings. We found that ALDH2 expression in most cancers not only decreases in LUAD but also in UCEC Also expressed decline. The characteristic information found that the change of ALDH2 was not signi cantly correlated with age group and race.
Combined with the clinical information of UCEC patients in the TCGA database, it was found that in UCEC, the OS, DSS, DFI and PFI of the low expression group of ALDH2 had worse survival conditions. Bioinformatics enrichment function enrichment found that the changes of ALDH2 participate in cancer mainly through the regulation of immune function. The previous discussion showed that ALDH1 had been identi ed as a tumor cell marker (28). Christophe found that ALDH1 cell subsets have higher cancer stem cell characteristics (30). Mohamed identi ed that the expression of ALDH1 is highly correlated with the expression of Colorectal Carcinoma's tumor stem cell markers Notch1 and CD44 (31). And we calculated the characteristic value of immune in ltration of UCEC patients through CIBERSORT and found that the change of ALDH2 is highly correlated with CD8 + T cell. However, there are no reports about the changes of ALDH2 and the research of cancer stem cells and immune-related research. Our ndings provide new ideas for further research on the regulation of ALDH2 on cancer. In addition, we initially explored the relationship between the expression of ALDH2 and the progress of UCEC in vivo and in vitro. It was found that the expression of ALDH2 was reduced in the three UCEC cell lines ISHIKAWA, SPEC-2, and KLE. At the same time, restoring the expression level of ALDH2 in the cell line can reduce the proliferation ability of tumor cells. And in vivo experiments have also veri ed our ndings.
Gene mutation is one of the main factors of gene inactivation. Through the Cbioportal database, it was found that the changes in ALDH2 were not due to genetic mutations. In recent years, a new way of regulating genes non-coding RNA regulating genes has been widely veri ed. Non-coding RNA is a type of RNA that cannot encode the protein. It has been considered a "junk" transcription product for a long time.
However, research in recent years has changed people's understanding of ncRNA, and more and more studies have found that ncRNA. It is a class of functional regulatory molecules that can regulate a series of cellular processes, including chromosome remodeling, transcription, post-transcriptional modi cation, and signal transduction, and plays a crucial role in developmental and disease processes (32). NcRNA mainly includes micro miRNA, lncRNA, and circRNA, which coordinate their parts and play an essential role in the and miR-16 is reduced or even missing. This is the earliest direct evidence that the abnormal expression of miRNA is related to tumorigenesis (34). Subsequently, more and more tumor miRNA chips have con rmed that miRNA can indeed regulate proto-oncogenes. Or the expression of tumor suppressor genes, which play an important role in the occurrence and metastasis of a variety of malignant tumors (35)(36)(37). By taking the intersection of the UCEC data, it was found that the expression of miR-135-3p in UCEC increased, which was negatively correlated with the expression of ALDH2. Therefore, through dual-luciferase reporter gene experiments, RIP, RNA pull-down, it was found that miR-135-3p can target ALDH2. At the same time, the expression of miR-135 was up-regulated in NSCLC cells. Silencing miR-135 can inhibit cell viability, migration, and invasion (38). In lung cancer, overexpression of ALDH2 inhibits the malignant characteristics of lung adenocarcinoma cells, such as proliferation, stemness, and migration, while knocking down ALDH2 increases these characteristics (29). These results verify this regulatory relationship from the side.
We found earlier that the low expression group of ALDH2 had a worse survival status. Therefore, we developed a risk score model based on ALDH2 expression to predict the survival of UCEC patients. In order to construct a risk scoring model, we screened 478 differentially expressed mRNAs from 538 originals in the UCEC dataset in the TCGA database according to the expression of ALDH2. According to COX regression and Lasso Cox regression model analysis, screening and constructing a prognostic risk scoring model. The risk scoring model can divide UCEC patients into high-risk and low-risk groups to predict their prognosis. In addition, whether it is through the time-dependent ROC curve or the C-index of the calculation model, the risk scoring model we build has high predictive sensitivity and performance.
The TCGA database provides complete data related to cancer, including gene editing, mutation information, clinical survival information, etc. In this study, by using the information provided by the TCGA database, we found that most of the ALDH family genes were abnormally expressed in 33 cancers. Among them, in UCEC, the expression of ALDH2 decreased, and the low expression group had a worse prognosis. In addition, we preliminarily discussed how the changes in ALDH2 are involved in the regulation of cancer and preliminarily veri ed that miR-301a-5p targets the 3UTR of ALDH2. Finally, a risk score model is constructed. The risk scoring model has high performance in terms of speci city and accuracy. It provides new ideas and directions for further exploring the mechanism and treatment strategies of UCEC. However, current research still has some limitations.

Conclusion
This study analyzed the expression of ALDH2 and its family genes in 33 cancers. And in UCEC, the expression of ALDH2 and the role of survival were deeply explored. It was found that the expression of ALDH2 was reduced in UCEC patients, and the survival status of the low expression group was worse. At the same time, bioinformatics and experimental methods are used to verify the function of ALDH2 in UCEC patients. The results showed that restoring the expression of ALDH2 can inhibit tumor progression, and the expression of ALDH2 is regulated by miR-135-3p. In addition, we propose a survival model based on the expression of ALDH2, whose C-index is 0.798 and AUC is 0.764.

Consent for publication
All authors have read this manuscript and approved for submission.

Availability of data and material
The data generated during this study are included in this article and its supplementary information les are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests. Author contributions XHL designed research; XHL, YX, and ZTD performed research., ZTD analyzed data; and XHL wrote the paper. ZTD participated in the revision of the manuscript. All authors read and approved the nal manuscript.      Luciferase activity was measured with a dual-luciferase reporter assay. G. Dual-luciferase assay of the mutation group. H. RNA pull-down analysis with ALDH2 antibody. I. RIP assay was further veri ed for a direct association between hsa-miR-135-3p and ALDH2. J. Overexpression e ciency of hsa-miR-135-3p mimic. K. WB analysis the expression of ALDH2 with overexpression of hsa-miR-135-3p. L. qRT-PCR analysis relative expression of ALDH2 with overexpression of hsa-miR-135-3p. M. Kaplan-Meier survival curve for patients with UCEC (According to the median value of aldh2 expression, it is divided into two groups of high and low expression). *p < 0.05, * * p < 0.01, * * * p < 0.001.