ceRNA Network Analysis Shows That lncRNA LGALS8-AS1 Upregulates DCTPP1 To Promote Breast Cancer Progression Through Sponge miR-125b-5p

Background: Breast cancer is the most common female cancer in the world. Many scholars have devoted themselves to elucidating the pathogenesis of Breast cancer. In the past, dCTP pyrophosphatase 1 (DCTPP1) was thought to be overexpressed in several cancers. However, The mechanism by which DCTPP1 is regulated by non-coding RNA in Breast cancer and its relationship with immune inltration have not been elucidated. Results: In this study, reliable databases from the Cancer Genome Atlas (TCGA) and Gene Expression Integration (GEO) showed that the expression of DCTPP1 in Breast cancer tissues was higher than in normal tissues. Bioinformatics analysis showed that DCTPP1 was negatively correlated with the expression of hsa-miR-125b-5p in BRCA,The expression of LncRNA LGALS8-AS1 is positively correlated with the expression of DCTPP1, and negatively correlated with the expression of hsa-miR-125b-5p. Therefore, we speculate that lncRNA LGALS8-AS1 promotes tumor progression through sponge hsa-miR-125b-5p and maintains the overexpression of DCTPP1 in Breast cancer. The survival analysis of 3 genes showed that the overexpression of DCTPP1 and LGALS8-AS1 is related to the poor prognosis of patients. By analyzing the relationship between DCTPP1 and immune inltration, we found that the high copy of DCTPP1 is related to the inltration of CD8+ T cells, and the high expression of DCTPP1 is related to the inltration of CD4+ T cells in basal-like Breast cancer. DCTPP1 is positively correlated with the expression of immune checkpoint B7-H3. Conclusion: LNC LGALS8-AS1 can upregulate DCTPP1 by sponging with hsa-miR-125b-5p. DCTPP1 can be used as a new prognostic marker for B7-H3 antibody treatment of breast cancer.


Background
Breast cancer is the most common cause of death in women worldwide. According to literature reports, in Europe and North America, the cumulative incidence of breast cancer in women at the age of 55 is about 2.7%, at the age of 65 it is about 5.0%, and at the age of 75.2 it is about 7.7% [1] . Breast cancer is divided into 3 main subtypes: hormone receptor positive/ERBB2 negative (70% of patients), ERBB2 positive (15%-20%) and triple negative(15%) [2] . The incidence of breast cancer is related to multiple factors, including increased estrogen levels, genetic mutations, dietary changes, changes in environmental factors, etc. With advances in cancer research, several genes and pathways have been con rmed to be related to the pathogenesis of breast cancer, including Mutations in BRCA1, BRCA2 and TP53 genes and PI3K/Akt/mTOR pathways [3] . However, due to the complexity of the regulation of eukaryotic genes, many genes and pathways related to breast cancer have not been discovered. Therefore, it is particularly important to nd potential biomarkers with higher speci city and sensitivity to breast cancer.
Long non-coding RNA is a type of RNA molecule whose transcript length exceeds 200 nt but does not encode protein. It regulates the expression level of genes at multiple levels (epigenetic regulation, transcription regulation, post-transcriptional regulation, etc.) in the form of RNA. Overexpression, defects or mutations of lncRNA genes are related to many human diseases. LGALS8-AS1 is rarely studied in cancer. In the past, some scholars analyzed and reported that the high expression of LGALS8-AS1 was related to the distant metastasis of breast cancer through bioinformatics methods [4] . Wang et al. [5] reported that LGALS8-AS1 is related to the formation of atherosclerosis, but all the analysis of this gene needs to be further proved by experimental data.
MicroRNA (miRNA) is a class of non-coding single-stranded RNA molecules with a length of approximately 22 nucleotides encoded by endogenous genes. They are involved in post-transcriptional gene expression regulation in animals and plants. It has been demonstrated that an lncRNA may bind to a miRNA as a competing endogenous RNA (ceRNA) to affect the occurrence and development of tumors. mir-125b-5p has been extensively studied by many scholars. Li et al. [6] con rmed that mir-125b-5p is under-expressed in breast cancer cells through RT-QPCR and western blot experiments, thus speculating that it may be used as a tumor suppressor. However, the mechanism by which mir-125b-5p interacts with lincRNA and genes has not been fully elucidated. DCTPP1 (DCTP pyrophosphatase 1) is a protein-coding gene that plays a central role in the balance of dCTP and the metabolism of deoxycytidine analogs, thereby contributing to the preservation of genome integrity [7] . In past studies, some scholars believe that DCTPP1 is highly expressed in breast cancer tissues and is related to the occurrence of breast cancer. In gastric cancer, the high expression of DCTPP1 can reduce the sensitivity of patients to chemotherapy drugs [8] . In prostate cancer, DCTPP1 is also related to the poor prognosis of patients [9] . Many evidences indicate that DCTPP1 is related to the occurrence and prognosis of cancer. However, the mechanism by which DCTPP1 is regulated by upstream molecules and its relationship with immune in ltration have not been fully elucidated.
Tumor microenvironment refers to the internal environment in which tumor cells produce and live, which includes not only tumor cells themselves, but also surrounding broblasts, immune and in ammatory cells, glial cells and other cells, as well as nearby areas The intercellular substance, capillaries and biomolecules in ltrated in it. Whiteside [10] proposed that the failure of tumor host immune surveillance is one of the main reasons for the development of tumor immunotherapy. Therefore, restoring immune surveillance and protecting immune cells from tumor-induced suppression are reasonable goals for current anti-tumor therapy.
Immune checkpoints refer to a series of molecules expressed on immune cells (usually T cells) that can regulate the degree of immune activation. They play an important role in preventing the occurrence of autoimmunity. However, tumor cells can achieve tumor progression and metastasis by activating immune checkpoint pathways that inhibit anti-tumor immune responses. B7-H3(CD276) belongs to the B7 superfamily of immunomodulatory ligands and plays an important role in regulating the adaptive immune response of T cell co-suppressive/stimulatory factors. However, the mechanism of B7-H3 involved in the immune evasion of tumor cells in breast cancer has not yet been fully elucidated [11] .
In this study, we used bioinformatics methods to screen for differential genes from breast cancer tissues and normal tissues adjacent to cancer, and get the differential gene DCTPP1. Survival analysis and upstream miRNA prediction were carried out, and the co-expression related miRNA molecule miR-125b-5p was screened out. Survival analysis and co-expression analysis of miRNAs were performed to screen out LINC LGALS8-AS1 molecules. Finally, the immune in ltration analysis and immune checkpoint analysis of DCTPP1 are performed to provide reference for clinical target treatment of breast cancer.

Differentially expressed genes
Using the R-project to analyze the differential expression of the TCGA database, we found that the expression of DCTPP1 in tumor tissues is more than twice that of normal tissues and get a box-plot expressing the difference from the GEPIA website. We also got the same differential expression result of DCTPP1 from the GES42568 data set. (Figure 1) 2 Pan-cancer differential analysis Through pan-cancer differential expression analysis, we found that DCTPP1 is highly expressed in 16 types of tumor tissues including breast cancer and Bladder Urothelial Carcinoma. (Figure 2) 3 Screening of miRNAs that bind to target gene Perform differential expression analysis and gene co-expression analysis on all miRNAs downloaded from the starbase database that may bind to DCTPP1)( Table 1). The results showed that hsa-miR-125b-5p was low expressed in tumor tissues(P<0.05 ), and it was negatively correlated with the expression of DCTPP1 (R=-0.18, P<0.05, LogFC=-2.071 )( Figure 3). 4 Screening lncRNAs that bind to miRNAs Perform differential expression analysis and gene co-expression analysis for all lncRNAs downloaded from the starbase database that may bind to hsa-miR-125b-5p. The results showed that LGALS8-AS1 is highly expressed in tumor tissues(P<0.05), negatively correlated with the expression of hsa-miR-125b-5p(R=-0.19, P<0.05), and positively correlated with the expression of DCTPP1(R=0.19, P<0,05)( Table 2). So far, we have obtained the expression relationship of LGALS8-AS1, hsa-miR-125b-5p and DCTPP1, and mapped the regulatory network( Figure 4). Table 2. Correlation analysis results of co-expression of LNC RNA and miRNA or mRNA 5 Survival analyse The survival analysis of LGALS8-AS1, hsa-miR-125b-5p and DCTPP1 was performed, and the results suggested that the higher the expression of DCTPP1 and LGALS8-AS1, the shorter the survival time of the patient(P<0.05), while the high expression of hsa-miR-125b-5p has no signi cant relationship with the survival time of the patient(P=0.217)( Figure 5).
6 Correlation between target gene and immune cells By analyzing the correlation between DCTPP1 copy number variation and immune cells, we found that when DCTPP1 high copy number variation in tumor tissues, the expression of B cells also increases(P=0.041). The correlation analysis between gene expression and immune cells showed that DCTPP1 was positively correlated with the expression of B cells. In basal-like breast cancer tissues, the expression of DCTPP1 is positively correlated with the expression of CD4+T cells and dendritic cells(R=0.243, P<0.05) (Figure 6).

Correlation analysis between genes and immune checkpoints
Correlation analysis between DCTPP1 and immune checkpoint genes, we found that the expression of DCTPP1 is positively correlated with the expression of CD276 (R=0.194, P<0.05), but is no obvious correlated with the expression of CD274, PDCD1 and CTAL-4( Figure 7).

Discussion
In recent years, some scholars have con rmed that DCTPP1 is highly expressed in breast cancer tissues through basic research. Based on this research background, we analyzed the big data in TCGA through bioinformatics methods and found that DCTPP1 is highly expressed in up to 16 cancers, and its high expression is related to the poor prognosis of patients, so we believe that this study clari es The mechanism of DCTPP1 in the development of breast cancer is reliable and necessary [28] . The cells that make up the tumor microenvironment include vascular endothelial cells, immune cells (granulocytes, lymphocytes and macrophages) and broblasts. As scholars deepen their understanding of tumors, the occurrence of tumors is closely related to the microenvironment in which they are located, and the regulation of tumors by immune cells cannot be ignored. Because B cells are phenotypically and functionally heterogeneous in the tumor microenvironment, B cells can produce cytokines that regulate T cells to suppress the anti-tumor immune response [29] . B7-H3 is a new member of the B7 family. It is highly expressed in some tumors and is related to the poor prognosis of patients. The involvement of B7-H3 in tumor immune evasion has become the consensus of most scholars. Based on the successful experience of many immune checkpoint (CTLA-4, PD-1, PD-L1) inhibitor clinical drugs, it is necessary to develop promising therapeutic monoclonal antibodies targeting B7-H3. However, the speci c immune evasion mechanism of B7-H3 has not yet been elucidated. In this study, we used big data to link DCTPP1 and immune cells, and through the analysis of its correlation with immune checkpoint B7-H3, we can speculate that DCTPP1 can be used as a new anti-B7-H3 antibody in the treatment of breast cancer. This is a prognostic marker of great signi cance in guiding the clinical application of anti-B7-H3 antibodies.
However, this study also has its shortcomings. There are many types of breast cancer at present, and the treatment effects of different types of breast cancer patients are also different. This study only analyzes at the level of all types of breast cancer. When speci c to each type, DCTPP1 The correlation with immune cells and immune checkpoint B7-H3 is expected to be further explored.

Conclusion
In this study, we excavated the DCTPP1 gene, which is more highly expressed in breast cancer tissues than normal tissues, through the TCGA database and veri ed it through the geo database. Performing pan-cancer differential expression analysis of genes, we know that DCTPP1 is overexpressed in 16 tumor tissues. In order to further explore the mechanism by which DCTPP1 is regulated, we constructed the lnc LGALS8-AS1-hsa-mir-125b-5p-DCTPP1 regulatory network. In terms of clinical signi cance, we did a survival analysis of the molecules on the network, and the results suggest that the high expression of LGALS8-AS1 and DCTPP1 is related to the poor prognosis of patients. Immune in ltration analysis and immune checkpoint correlation analysis of DCTPP1, we know that the high copy and high expression of DCTPP1 in tumor tissues are positively correlated with the degree of B cell in ltration, and we found that the high expression of DCTPP1 is associated with CD4+ in base-like breast cancer. Since B7-H3 as an immune checkpoint has been used as a breast cancer monoclonal antibody treatment target in recent years, we further explored the relationship between DCTPP1 and B7-H3, and found that the expression of DCTPP1 and B7-H3 in cancer tissues is positively correlated. On the one hand, it explains a possible reason for the high expression of B7-H3 in breast cancer tissue, and on the other hand, it also provides a reference for the possible mechanism of anti-B7-H3 antibody treatment of breast cancer.
Data And Methods

Gene expression data download
In this study, we downloaded 33 tumor expression pro le data and clinical data including breast cancer(a total of 1217 samples, including 1104 tumor samples and 113 normal tissue samples adjacent to cancer) from the Cancer Genome Atlas (TCGA,https://portal.gdc.cancer.gov/) [12] . Download the GSE42568 [13] dataset from the Gene Expression Omnibus (GEO,https://www.ncbi.nlm.nih.gov/geo/) [14] , including 121 samples (104 cases of breast cancer tissue, 17 cases of normal tissue).

TCGA analysis
We download R project [15] , library the "limma" package [16] , set the parameters Ιlog 2 FCΙ > 2, adjust p.value<0.05 to screen the samples for differential mRNA expression. Select one of the differentially expressed genes as the research object. Considering that there are fewer normal samples in TCGA, we log on to the GEPIA website [17] and added normal samples in the GTEx database [18] for the difference analysis of target gene.

GEO array analysis
Library the "impute" package [19] , set the parameters|logFC|>1, adjustP<0.05, input GSE42568 series matrix le, Obtain differentially expressed genes through Bayesian test. Obtain the differential expression of target gene in normal tissues and tumor tissues and use GraphPad prism 9 [20] to draw a box-plot for visualization.
3. pan-cancer differential analysis Use R-project and library the limma package, input the gene expression le of each tumor in a loop. Set parameters to creen tumor types with a sample size greater than 5 in the normal group and apply wilcox.test to extract the expression of the target gene in 33 cancers. Finally draw a box plot.

miRNA data download
Download Isoform Expression Quanti cation data from TCGA, Predict miRNA binding to mRNA from starbase [21] and download all data .

lncRNA data download
Predict lncRNA binding to miRNA from starbase and download all data. All the downloaded les are used for differential expression and survival analysis.
7. Screening of lncRNA co-expressed differently with miRNA Library the limma, ggpubr, ggExtra, reshape2 R packages, set the corFilter>0.1, pvalue Filter<0.001, input the miRNA and mRNA expression le from TCGA and lncRNA list le from starbase. Use R project scripts to draw related lncRNA scatter-plots and box-plots.

Survival analysis
In this study, we used the Kaplan-Meier method to analyze the survival of mRNA, miRNA and lncRNA, and drew the survival curves. Library the limma, survival [25] , survminer [26] packages, read survival data and expression data. Choose the best cut off and divide the samples into two groups with high and low expression, compare the survival differences between the high and low expression groups, and draw a survival curve. The survival curve of miRNA and lncRNA is drawn through R project, and the survival curve of mRNA is drawn through the GEPIA website. 9. The correlation between single gene and immune cells Through the previous differential expression analysis, we obtained the target gene. In order to explore the correlation between gene copy number and immune cells and the correlation between gene expression and immune cells, We use the TIMER database [27] , select sCNA, enter the target gene and target immune in ltrating cells, Select the high copy and submit it to get the correlation between the number of gene copies and immune cells. As for the correlation between gene expression and immune cells, we also use the TIMER database, select 'gene', target gene and target immune in ltrating cells. When the correlation coe cient is greater than 0.15, we believe that the two have a positive correlation.

Correlation analysis between single gene and immune checkpoint
We select CD274, CD276, CTLA-4 and PDCD1 as immune checkpoint genes for analysis. Log in to the TIMER website, select 'Gene', enter the target gene and immune checkpoint gene respectively, get the result graph, and verify it through the GEPIA website, save the result.When the correlation coe cient is greater than 0.15, we believe that the two have a positive correlation. Declarations 1.Consent for publication Not applicable.

2.Competing interests
The authors declare that they have no competing interests.

3.Funding
This research received no speci c grant from any funding agency in the public, commercial, or not-forpro t sectors.

4.Contributions
All authors contributed to the conception of the study. Jiayin Zhang contributed to the data analysis, Yuan Qu provided guidance on data analysis methods, Jiayin Zhang, Tingting Hou and Wanbao Ge drafted the manuscript together, Shanyong Zhang carried out the subject design and approved the nal submitted manuscript version.All authors read and approved the nal manuscript.

Availability of data and materials
The datasets GSE42568 during the current study are available in the GEO repository, [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE42568] The author can provide all the R language scripts and data used during the current research period