Potential for the blood- biomarker cell free UPK2 RNA to detect recurrence during monitoring after surgical resection of adenocarcinoma of the lung


 Background: Lung cancer is one of the most common malignant tumors in the world. Non-small-cell lung cancer (NSCLC) accounting for top of all diagnosed lung cancers. Lung adenocarcinoma(LUAD) is the most common subtype in NSCLC. This study attempts to identify and validate biomarkers that can be used to monitor recurrence after LUAD surgery. Methods: In this research, we downloaded lung adenocarcinoma data from the TCGA database and selected postoperative recurrence samples, and then performed WGCNA analysis to find key co-expression gene modules. Enrichment analysis of key gene modules was also performed using the DAVID database. Finally, we performed a survival analysis of the most interesting biomarker UPK2 obtained from the TCGA analysis in the Oncomine database and evaluated its impact on prognosis. We collected 132 blood samples from patients with early-stage lung adenocarcinoma who were tested the expression level of free mRNA in the plasma.Results: The results revealed that UPK2, KLHDC3, GALR2 and TYRP1 occupied a central position in the co-expression network which were also significantly correlated with the survival of patients. The expression level of free UPK2 in the plasma relative to GADPH in non-relapsed patients was 0.1623, and in relapsed patients, it was 0.2763. ROC was used to evaluate the effectiveness of free UPK2 mRNA in the blood for monitoring postoperative recurrence, with an AUC of 0.767 and a 95% confidence interval of 0.675-0.858. At the same time, patients with high expression of free UPK2 mRNA had significantly poorer survival than those with low expression of UPK2. Conclusions: The expression level of free UPK2 mRNA in plasma has the potential as an indicator of postoperative recurrence in patients with early LUAD. This may have guiding significance for the subsequent clinical treatment of patients.


Introduction
Non-small-cell lung cancer (NSCLC) accounting for approximately 85% of all diagnosed lung cancers (1).
Lung adenocarcinoma (LUAD) is the most common subtypes of NSCLC and the rate of morbidity and mortality have risen in recent years (2,3).But progress in science and technology has promoted new treatment options for patients greatly. Han et al (4)discovered that pemetrexed plus carboplatin combined with ge tinib extended survival for patients with LUAD harboring sensitive EGFR mutations. Also, Zhuang et al (5) showed that combine nadroparin with radiotherapy induces stronger synergistic antitumor effects in LUAD A549 cells. Meanwhile, new studies showing that current treatment strategies can be further improved (6)(7)(8).
Despite advances in surgery, molecular subtyping and targeted therapy, prognosis of LUAD remains relatively poor (9). Patients with LUAD often relapse and develop metastases after surgery, chemotherapy and radiation therapy (10). Because of those malignant features, patients just have the 50%-70% 5-year survival rate even in early stage of LUAD (11). Moreover, patients with advanced LUAD, which is often resistant to conventional chemotherapies or targeted therapeutic drugs (12). Almost without exception, all of the high-potency anticancer drugs are not long term effective. Cancer cells become resistant to anticancer drugs as they rapidly mutant. Although the surgical methods of cancer are constantly improved and chemotherapy drugs are updated, the problem of completely removing residual cancer cell still cannot be solved. So, the risk of recurrence is still high. The recurrence of cancer can make a big challenge to the further treatment and the deterioration is so rapid. Also, the recurrent correlation mechanism have not yet been speci c. Among 289 patients with stage and lung adenocarcinoma, 85 had found a distant recurrence in ve years (13). So the study of cancer recurrence may have a very important clinical value.
With the development of precision medicine, more and more molecules, the so-called biomarker molecules, have been found to have a high correlation with the certain biological events. Norman et al.
have found that Ink4a/Arf expression is a biomarker of aging, and the increasing expression of Ink4a/Arf is related to advancing age in all rodent tissue (14). In addition, CSF Aβ1-42 is found and veri ed as the biomarker for Alzheimer's disease in the autpsy cohort of CSF sample with high sensitivity for decetion of 96.4%. (15), the biomarkers are not just limited to the proteins, and also mRNA can act as the biomarker for some biomolecular issues. Naoharu lwai et al. have already found that miR-208 can be used as the useful indicator of myocardial injury (16). Obviously, the discovery and veri cation of tumor molecular markers is one of most popular focuses in this eld, and signi cant progress has been made, especially in the discovery and identi cation of molecular markers associated with the clinic effects of tumor therapy. In 2011, Human epididymis protein 4 (HE4) was approved by the Food and Drug Administration to monitor recurrence or progressive disease in epithelial ovarian cancer in conjunction with CA125 (17).
Chen et al. reported that abnormalities of AIB1 was tightly associated with clinical/prognostic signi cance of urothelial carcinoma, and high AIB1 expression was associated with increased hazard ratios for 5-year CSS (80.6% vs. 55.8%, p= 0.008) and OS (78.1% vs. 54.8%, p= 0.006) (18). The HOXB13/IL17BR (H/I) biomarker predicted recurrence risk in ER-positive, lymph node-negative breast cancer patients (19). There were also a considerable number of reports on this eld, and more and more biomarkers were veri ed and applied to clinics, leading to the corresponding patients bene t from this progress.
In this study, the bioinformatical approaches were employed to analyze a robust of sequence data, and several biomarkers were delved and then veri ed by using the clinic samples. The obtained biomarkers in this study may play a key role in the cancer recurrence.

Data analysis:
The datasets used in this study were downloaded from the TCGA database. The softConnectivity function in the WGCNA package was used to analyze the effects of different power values on the coexpression network and co-expression modules in the scale independence and average connectivity . The "randomly selected gene" parameter was set to 5000, and the other parameters were set to default values.
Next, we summarized the expression values by using the collapseRows function in the WGCNA package, and then performed cluster analysis using ashClust, and the interaction/correlation of each module was visualized through heat map. In addition, the interested gene modules were cluster analyzed by the GO in the DAVID database(DAVID https://david.ncifcrf.gov/summary.jsp). And the survival analysis of the corresponding genes were performed by using Kaplan-Meier analysis and Oncomine database.

Patient recruitment
We collected blood samples from patients with early-stage lung adenocarcinoma who went to the Department of Oncology, Shanghai Changhai Hospital for surgery from February 2006 to March 2010. All patients received systematic treatment after surgery, including but not limited to the optimal local treatments, such as radiochemotherapy and targeted therapy. Recurrence of the disease was based on imaging ndings (CXR or CT). This study was in accordance with the Declaration of Helsinki and Good Clinical Practice and was reviewed by Shanghai Changhai Hospital Ethics Committee. All patients signed informed consent with respect to the use of their blood for scienti c purposes.

Sample collection and processing
The blood samples of 132 patients were collected for the rst time at 90 days after surgery. Thereafter, the second, third and fourth collections were performed on the 180th day, the rst year and the second year after surgery, respectively. If disease recurrence was found during repeated examinations, blood samples collection would be stopped. Blood were collected in 5 ml heparin anticoagulation tubes and immediately used to extract free mRNA (Geneseed, China). The free mRNA was reverse transcribed using a reverse transcription kit (TAKARA). A quarter of the RT product was mixed with a pre-formulated 2×SYBR Green PCR mix containing UPK2 qRT-PCR primers and then supplemented with water to 20 ul of system (Roche). Ampli cation was performed to collect the dissolution curve as follows: 94°C for 5 Minutes; 40 cycles at 94°C for 2 minutes, 60°C for 1 minute and 72°C for 2 minutes; last cycle at 72°C for 2 minutes. We also detected GAPDH mRNA as an internal reference and calculated the relative expression of UPK2 by the 2 -ΔCT method.

Statistical analysis
This analysis included only 132 samples of patients with lung adenocarcinoma admitted to our hospital.
The mean serum UPK2 level of recurrence was calculated. The ROC analysis of lung adenocarcinoma recurrence was assessed using the pROC package in the R language and the area under the curve was calculated. The survival curve is plotted using the ggsurvplot package in the R language.

Result
Exploration of speci c co-expression modules associated with lung adenocarcinoma recurrence To establish a co-expressed gene network associated with postoperative recurrence of lung adenocarcinoma, we used WGCNA to analyze gene expression pro le data from patients with recurrent lung adenocarcinoma in the TCGA database. Finally, we screened the transcriptome data of 24 relapsed patients, including 14 males and 10 females. Based on the correlation between each two genes, the gene expression data of these patients with recurrent lung adenocarcinoma were classi ed into 39 gene modules using unsupervised average linkage hierarchical clustering, and labelled in a heat map with different colors ( Figure 1A). Gene modules of different colors contained mutually exclusive co-expressed genes. For some genes that could not be classi ed into a particular module, we incorporated them into gray modules. WGCNA could analyze the correlation between gene modules and a series of phenotypes, thus this method was used to analyze the correlation between the speci c gene modules of patients with postoperative recurrent early lung adenocarcinoma and a series of phenotypes, such as age, gender, survival, recurrence, recurrence type and pathological stage. Without any phenotypic and genetic preferences for module partitioning, we found that the purple module had a signi cant relationship with survival and recurrence, with correlation coe cients greater than 0.7 ( Figure 1B). Therefore, we believed that these genes and their co-expression patterns may be associated with the recurrence of lung adenocarcinoma.
Biological insights from module purple WGCNA classi es co-expressed genes of all patient samples into speci c modules related to a series of traits regulated by the same mechanism. In the previous section, we obtained the purple module most relevant to postoperative recurrence. To verify the relationship between the co-expressed genes contained in the purple module and lung adenocarcinoma, we further constructed a heat map of the gene expression in 24 recurrent tumor tissues and 53 paracancerous tissues. The results showed signi cant difference in the expression pattern of the purple module between paracancerous tissues and recurrent tumor tissues ( Figure 2A). However, we observed that the expression pattern of the purple module gene in patients with recurrent tumors was not as consistent as in paracancerous tissues, which exhibited three expression patterns of light red, light blue, and deep red, suggesting different mechanisms of relapse. Of the 117 genes in the purple module, there were 68 genes with signi cant differences between recurrent tumor tissues and paracancerous tissues (logFC|<0.6|, FDR<0.05) ( Figure 2B). These 68 genes were used to map the expression heat maps of the two tissues(recurrent tumor tissues and paracancerous tissues), which showed that the expression patterns of the 68 genes were clearly more uniform than in the heat maps constructed with 117 genes. To further analyze the biological function of the genes in the purple module, the genes of the purple module (117) were then analyzed by GO in the DAVID, and the most signi cant Go term was "Cytosol" (p value=0.0356) ( Figure 2C).
Further clari cation of key genes associated with adenocarcinoma recurrence in the purple module Gene signi cance (GS) has a high correlation with gene connectivity, which means that nodes with higher connectivity in the co-expression network also play an important role in the process of performing biological functions. Therefore, we also constructed a co-expression network of genes for lung adenocarcinoma recurrence, and obtained a total of 2840 edge and 879 nodes (power=8) ( Figure 3A). We found that there were four genes, UPK2, KLHDC3, GALR2, and TYRP1, in the co-expression network with more nodes linked appearing in the purple module, which was highly correlated with survival and recurrence (Table1). Among these genes, the expression levels of UPK2, KLHDC3 and GALR in tumor tissues were higher than those in paracancerous tissues, and the expression level of TYRP1 in tumor tissues was lower than that in paracancerous tissues. Then, we further analyzed the function of these four genes in lung adenocarcinoma using more clinical data in the Oncomine database. The results showed that the survival outcomes of patients with low expression of UPK2, KLHDC3 and GALR2 were signi cantly better than those of patients with high expression (Figure 3B, C and D), with P values of 4.9e-05, 0.009, and 1.7e-05, respectively; while patients with high expression of TYRP1 had signi cantly better prognosis than those with low expression, with P value of 2e-07 ( Figure 3E).
Demographic information and clinical characteristics of patients with surgically treated early-stage lung adenocarcinoma receiving UPK2 plasma free mRNA testing Recent studies have shown that plasma free mRNA has the potential to act as a tumor marker. Table 2 shows the demographic information and clinical characteristics of 105 patients who meet the study criteria out of 132 patients with early-stage lung adenocarcinoma admitted to our hospital. Of these ADC patients, 58 are male (55%), 47 are female (45%), and the average age of all patients is 58 years (39-83 years), indicating that the patients admitted do not have age or gender bias. The pathological stage of most patients is stage I or stage II (83%), and that of the remaining patients is stage IIIa (17%). After surgery, 43 patients received adjuvant therapy (41%), including 39 patients receiving radiation therapy and 8 patients receiving adjuvant chemotherapy.

Diagnostic performance of UPK2
We began to collect the patient's blood, test the free UPK2 level, and then perform imaging examination from the time of rst repeated examination, which was on the 90th day after surgery. If the imaging examination indicated that the patient had relapsed, then he or she would be classi ed into the relapsed group, and the relative expression level of UPK2 mRNA detected would be recorded. If the patient had no recurrence during the follow-up period, then the mean value of multiple testing would be recorded as the corresponding UPK2 expression level. We found that there were no signi cant differences in UPK2 between lung adenocarcinoma patients of different ages and genders ( Figure 4A and B). Interestingly, for non-relapsed patients, UPK2 was maintained at a lower expression level. The expression level of UPK2 relative to GADPH in relapsed patients was 0.2763, while the average UPK2 expression level in nonrelapsed patients was 0.1623, which was signi cantly lower than that of relapsed patients (P < 0.001; Figure 4C). More interestingly, for the same patient, the level of UPK2 expression at relapse was higher than that when there was no recurrence ( Figure 4D). In addition, we plotted the ROC curve and calculated the AUC to determine whether the expression level of UPK2 in plasma could be used to distinguish between relapsed and non-relapsed patients ( Figure 4E). The results showed that, when plasma UPK2 expression levels were used alone as diagnostic biomarkers, the AUC was 0.767 with a 95% con dence interval of 0.675-0.858. Moreover, ADC patients included in the study were also divided into UPK2 high expression group and low expression group, and their survival curves were plotted, respectively. The results indicated that patients with high plasma UPK2 mRNA expression had poorer survival, while those with low plasma UPK2 mRNA expression had a better prognosis ( Figure 4F).

Discussion
Lung cancer has become a global health problem due to high morbidity and high mortality (20). Lung adenocarcinoma (ADC) is a histological type of non-small cell lung cancer that is becoming a major component of lung cancer (20)(21)(22). Despite signi cant advances in cancer treatment in recent years, 5year survival rate of ADC is still not satisfactory (23,24). With the advent of precision medicine concepts, molecular biomarkers and molecular drug targets have become hotspots in cancer research, thus targeted treatment of certain targets has enabled tumor patients to get different degrees of treatment bene ts. For example, lung cancer patients with EGFR mutations can bene t from the treatment of TKIs, such as Ge tinib and Erlotinib (25). Other potential biomarkers are mainly oncogene-driven mutations, including ALK translocation and ROS1 gene rearrangement (26,27). Therefore, there is an urgent need to identify and validate clinically relevant and effective prognostic markers for lung ADC to complement existing molecular biomarkers and further guide treatment decisions.
Uroplakin 2 (UPK2) is a highly speci c marker of bladder transitional cell carcinoma. As early as 1999, it was found that UPK2 mRNA was detected in blood samples from two patients with metastatic bladder cancer who did not receive chemotherapy and 1/8 of patients with metastatic bladder cancer who received chemotherapy, but it was not detected in 50 patients with non-metastatic bladder cancer or in normal control group, indicating that detection of peripheral blood UPK2 was associated with metastatic spread of bladder cancer cells (28). UPK2 speci city and sensitivity testing may be potential means of detecting bladder cancer metastasis, staging, and monitoring chemotherapy response. Lotan et al. (29) have tested 11 immunohistochemical markers at the primary sites of various micropapillary carcinomas, and found that urinary tract protein (UPs) can also be used as the best marker for identifying urinary mesothelial IMC. Li et al. (30) have found that UPK2 is expressed in 63% of plasma cell samples, signi cantly higher than UPK3 (6%), and they further indicate that UPK2 is a valuable marker and should be included in immunohistochemical markers to facilitate the differential diagnosis of tumors with plasmacytoid features. Further studies by Matuszewski et al. (31) have found that with the development of bladder cancer, the concentration of UPK2 in the urine is decreased, which further con rms the diagnostic value of UPK2 concentration in plasma and urine for bladder cancer. Based on clinical diagnostic needs, Tian et al. (32) have evaluated the expression of UPK2 by bladder tissue microarray and nd that UPK2 is highly speci c (100%). It can therefore be used as a marker to identify urothelial lineage tumors and to help distinguish between bladder and prostate cancers, or used in combination with GATA3 as potential markers for metastatic breast cancer.
Hoang et al. have found that the positive rate of UPK2, GATA3 and p40 antibody combined testing was 94.2% (97/103) in invasive urothelial carcinomas, indicating combination of these three antibodies has a high sensitivity to the differential diagnosis of invasive urothelial carcinoma. But the combination testing of UPK2, GATA3 and p40 is negative in lung adenocarcinoma, colon adenocarcinoma and renal cell carcinoma (33). However, so far, there is still no report on the expression and role of UPK2 in patients with ADC recurrence.
In this study, we rst veri ed the expression level of UPK2, and found that the expression of this gene was signi cantly increased in patients with ADC recurrence and the prognosis of patients with high expression of UPK2 was poor. These different prognostic trends are consistent with previous studies in different types of cancers, suggesting that the differential expression of UPK2 in patients with ADC recurrence may have clinical implications. Enrichment analysis showed that the function of this gene was mainly related to post-translational modi cation of proteins. Thereafter, we collected blood samples from 105 patients with ADC, 35 of whom were patients with ADC recurrence, and the other 70 patients with no recurrence. RT-qPCR showed that the expression of UPK2 mRNA in the blood of relapsed patients was signi cantly higher than that of patients without recurrence, indicating that the difference in UPK2 expression was closely related to the recurrence of ADC and UPK2 was expected to be a biomarker for recurrence of ADC patients. In future studies, we will examine the expression of UPK2 in ADC patients with different genders, stages, and lymph node metastasis, and further clarify the clinicopathological correlation of UPK2 expression in order to provide a promising new strategy for targeted therapy of ADC.

Conclusions
After bioinformatics analysis of the data, we obtained genes related to the prognosis of patients, and then tested the expression level of free mRNA in the blood of patients to verify our prediction. We found that the survival rate of patients with high expression of free upk2 mRNA was signi cantly poorer than that of patients with low expression of free upk2 mRNA. Therefore, The expression level of free UPK2 mRNA in plasma has the potential as an indicator of postoperative recurrence in patients with early LUAD. Highlights 1. Free UPK2 mRNA in plasma may be an effective biomarker for monitoring recurrence after early LUAD surgery.
2. Detection of free UPK2 mRNA in plasma is convenient for manipulation.
3. The prognosis of LUAD patients with low UPK2 expression in cancer tissues is better than that of patients with high UPK2 expression. Increased free UPK2 mRNA in plasma means recurrence of LUAD.
Abbreviations Availability of data and materials The datasets during and/or analysed during the current study available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding
No funding was received.

AUTHORS CONTRIBUTION
JZ performed all experiments, prepared the gures and wrote the manuscript. QL and BL performed clinic sample collection and preparation. HL and CW performed bioinformatics analysis. CL and HJ provided the laboratory support, discussion, reviewed manuscript and provided nancial support.
All authors have read and approved the manuscript