Identication of Pathogenesis and Prognoses of Relapse in Myeloma by Bioinformatic Method

Background: Multiple myeloma (MM), the second most hematological malignancy, the molecular mechanism and pathogenesis of the relapse of MM is poorly understood. This study aimed to identify novel prognostic model for MM and explore potential mechanism of relapse. Methods: Gene expression data,clinical data(GSE24080) and HTseq-Counts les were downloaded from Gene Expression Omnibus (GEO) and TCGA database. Co-expression modules of genes were built by Weighted Correlation Network Analysis (WGCNA).KEGG and GO enrichment analysis were performed in each module. TATFs (tumor-associated transcription factors) were retrieved from the Cistrome. Twenty-two immune cell compositions was calculated by CIBERSORT algorithm.Univariate and multivariate Cox congression were performed and a predictive model by prognostic genes was constructed,the predictive power of the model was evaluated by Kaplan–Meier curve and time-dependent receiver operating characteristic (ROC) curves. Results: A total of 940 DEGs were identied,and in WGCNA analysis, yellow, brown and sky-blue modules were most associated with clinic traits.The yellow module related with the cell cycle and the brown and sky-blue modules correlated with cytokine and its receptors, where the M2 macrophage fraction is positively correlated with CCL18, CCL2, CCL8, CXCL12 and CCl23 were positively correlated with plasma cells by Cibersort analysis.Prognostic genes were identied and two genes (TPX2,PRAM1) were nally identied to construct a risk model for predicting EFS.


Introduction
Multiple myeloma (MM, plasma cell myeloma) is a malignant hematologic disease characterized by the clonal proliferation of malignant plasma cells. The treatment of MM has changed dramatically in recent years, with the use of proteasome inhibitors, immunomodulatory agents, and autologous hematopoietic stem cell transplantation in MM patients combined with standard chemotherapy, the survival of MM patients has been signi cantly prolonged, but all patients will eventually relapse and often demonstrate multiple drug resistance [1].Besides, new technologies and methods such as CNV Radar to de ne a riskscore of MM for informing clinical and therapeutic decisions [2].Weighted gene co-expression network (WGCNA) are also a systems biology approach for describing the correlation patterns among genes across samples and can be used to nd clusters (modules) of highly correlated genes,summarize these clusters using the module eigengene or an intramodular hub gene, correlate intermodule and external sample traits (using eigengene network methodology), and calculate module membership metrics [3].The correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. It has been widely utilized in many malignant diseases and non-malignant diseases .In the present study, we aimed to use WGCNA approach to identify coexpression networks with MM recurrence and identify novel prognostic models,and to elucidate the molecular mechanism and pathogenesis in the relapse of myeloma to understand the function and interrelationships among abnormally expressed genes.

Data preparation
The gene expression and clinical information data of GSE24080 [4] were downloaded from the gene expression omnibus (GEO) website (http://www.ncbi.nlm.nlh.gov/geo/). A total of 559 cases (including training and validation sets) were performed based on the GPL96 platform ([HG-U133A]Affyme3trix human genome U133A array). There were 553 cases were examined for event-free survival (disease relapse or progression) and additional clinical information was selected to identify prognostic genes and to construct a prognostic model in GSE24080.HTseq-Counts data was obtained from The Cancer Genome Atlas (TCGA) website (https://portal.gdc.cancer.gov).Total 47 patients(94 samples) at two different time point(primary and recurrence) were used to identify DEGs.Raw counts were transformed into TPM for WGCNA analysis and correlation between genes.

Screening of differentially expressed genes (DEGs)
The differentially expressed genes(DEGs) between primary and recurrent myeloid-derived cancers were screened from HTseq-Counts of 47 patients at 2 different critical times(primary and relapsed).

GO term and KEGG pathway enrichment analysis
To better explore the biological function and signal pathway of DEGs in module genes, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis using the R package clustering pro ler[6]and used a p.value <0.05 as the cutoff value for clustering.

Co-expression network analysis
The co-expression network was constructed by the R package WGCNA [7]to explore the co-expression of genes (using all protein-coding genes) and clustered into different modules.The PickSoft threshold function of WGCNA was used to calculate the soft threshold power β value,mean connection k and R 2 .The soft power was set to 7 and the scale-free networks were validated by a power-law distribution.Then, we constructed co-expression modules and used gene signi cance (GS) and module membership(MM) metrics to identify potential modules with both high trait signi cance and module membership which are signi cantly related to clinic traits(relapse). Module membership was the correlation between gene expression pro les and module eigengene.GS was the absolute value of correlation between gene expression and module traits. Identi cation of prognostic key genes and construction of the prognostic model Key genes were identi ed from both hub genes and the DEGs.All patients were randomized into two groups (289 cases in the training set and 264 cases in the test set) and univariate and multivariate Cox regression analyses were used to investigate the correlation between EFS and the expression level of key genes with package survival[8]. P-value <0.001 was considered as a signi cant difference for univariate analysis. Key genes with P< 0.001 based on the univariate analysis were further included in multivariate Cox regression analysis. After three validation sets.Finally, a prognostic riskscore model of MM patients was established based on linear combination of regression coe cient multiplied by expression level from multivariate Cox regression models. The predictive accuracy of our model was evaluated by classifying patients into low-and high-risk groups based on the optimal cut-off value Kaplan-Meier (K-M) survival curves, time-dependent ROC curve analyses and area under the curve (AUC). Also the prognostic models were validated in the validation set and whole set by K-M curves, ROC curve analyses, and AUC [9].

Identi cation of independent prognostic parameters of MM
To identify the independence of our model,we enrolled additional clinical information to verify whether our model was independent by multivariate Cox regression analyses of 484 patients with clinic data(missing values were discarded). In addition to risks core,PROT(treatment protocol (TT2 or TT3)), ISOTYPE(IgA,IgD,IgG),B2M(Beta-2 microglobulin, mg/L),LDH(Lactate dehydrogenase,U/L),MRI(number of magnetic resonance imaging-de ned focal lesions (skull, spine, pelvis)),BMPC(bone marrow biopsy plasma cells (%)),HGB(Haemoglobin, g/dl) and RACE(Race (white or other))were included in the analyses.It shows that the risk score and isotype LDH/B2M of our model have prognostic value.They are independent of EFS in myeloma patients.
Tumor-associated transcription factors (TATFs) expression relationship with co-expressed genes A total of 318 TATFs were retrieved from the Cistrome (http://cistrome.org/) [10], and TATFs that were both DEGs and module genes were considered as key regulatory transcription factors.The correlation coe cient value of TATFs' expression levels with the remaining genes in each modules were then calculated.For visualization,we used p.value≤0.05,correlation coe cient≥0.5 and p.value≤0.0001 in the yellow module, correlation coe cient≥0.8 in the brown module,and no TATFs in the sky blue to construct the TATFs-gene regulatory network.And the software cytoscape was use to visualize the regulatory network.
Cell-Type Identi cation by Estimating Relative Subsets of RNA Transcripts (CIBERSORT) By the CIBERSORT algorithm (http://cibersort.stanford.edu/),twenty-two immune cell compositions of bone marrow specimens were characterized based on the mRNA expression pro ling of the GSE24080 data set .The perm was set at 1000. Specimens with p < 0.05 were used for analyses. These immune cells including B cells naive, B cells memory, plasma cells, CD8 T cells, T cells CD4, naive T cells, CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), T cells gamma delta, NK cells resting, activated NK cells, monocytes, macrophages M0, macrophages M1, macrophages M2, dendritic cells resting, dendritic ells activated, mast cells resting, mast cells activated, eosinophils, and neutrophils.Correlation analyses were performed for different immune cells in each sample with interested chemokine. Differences in 22 immune cells scores in multiple myeloma bone marrow specimens were assessed by the Mann-Whitney U-test. Correlations of 17 kinds of cytokine expression with immune cell fractions in bone marrow samples were examined using Pearson correlation analysis.

Identi cation of DEGs in MM and the enrichment of DEGs
The DEGs screened from TCGA include 20 down-regulated genes and 920 up-regulated genes (Fig. 1A).To clarify the differentially expressed genes,we formed a heat map using the top 400 genes to (Fig. 1B).Then KEGG and GO analyses were further performed to nd out the general glimpse functions of differentially expressed genes.By KEGG analysis,DEGs were mainly and signi cantly enriched in cytokine receptor interaction, hematopoietic cell lineage, cell adhesion molecules, chemokine signaling pathway and NF-κB osteoclast differentiation,which were important pathways in the pathogenesis of myeloma.The next GO enrichment showed signi cant immunoreceptor activity, CXCR chemokine receptor binding, myeloid leukocyte migration and chemotaxis,and neutrophil enrichment.

Co-expression Analysis and Iidenti cation of Scale-free Network
Weighted correlation network analysis (WGCNA) was conducted to investigate the association between primary and recurrence of myeloma.To ensure that the scale-free network 7 was set to the optimal soft threshold power to ensure proper R 2 obedience to the power distribution and su cient average connectivity for networks construction. The 400 genes were sampled and clustered to construct a visual topological matrix network (Fig. 2F).
After picking a soft threshold power,We clustered the gene expression into the dynamic shear trees,and trees with similarities above 0. 75 were considered highly similar,so they were merged into one tree (Fig. 3A). Next,we included group(primary and recurrent) features in the analysis and calculated to distinguish modules highly associated with myeloma recurrence,then described the correlation coe cients and p-values between GS and MM in the three modules,which were found to be highly correlated, and nally selected genes in the brown sky blue and yellow modules for further study.

Enrichment analysis In each module
In the brown module,CCR chemokine receptor binding and chemokine activity were enriched,and a total of 19 different types of cytokine or its receptors were up-regulated in recurrent myeloma.(supplementary table3)297 Genes were mainly and signi cantly enriched in Nicotinamide adenine dinucleotide and Cytokine-cytokine receptor interaction in the sky blue module.In the yellow module yellow module there are 882 genes with GO terms mostly related to "Cell cycle".
Immune cells in ltration cytokine relationship and TATFsgene regulatory network 3 TATFs (CEBPA,ELF5 and CENPA ) were upregulated. CEBPA and ELF5 were tumor associated transcription factors in the brown module of the TATFs gene regulatory network (Fig. 5A). Expression of CEBPA and ELF5 was highly positively correlated with chemokine gene expression levels(such as CC13,CCL8,CCL23, CCL16, CCL 24) (supplementary table 2),suggesting that CEBPA may play a central role in the skewed chemokine/chemokine receptor axis. CENPA containing aberrantly regulated genes in the yellow module (Fig. 5B). The wider the borderline,the stronger the correlation coe cient between genes.Each gene played a role in the network and,in addition, we functionally classi ed the genes according to their up-regulation that may lead to tumor cell proliferation,anti-apoptosis and maintenance (Table 1). Genes expression of 17 differentially up-regulated cytokine and receptors from both brown and sky blue modules were selected to calculate the cor relationship with immune cell in ltration,showing that CCL18,CCL2,CCL8,CXCL12 were related to M2 Macrophages in ltration and CCL23 with plasma cells (Fig. 6). Identi cation of prognostic genes from both DEGs and hub genes Hub genes were screened by a ltering condition of GS(gene signi cance) higher than 0.2 and MM (module-membership)higher than 0.8. Next,previously identi ed DEGs were compared with hub genes (blue 93,brown 297 and yellow 154, Fig. 7)and the overlap part was identi ed as pivotal prognosis genes for further study.A total of 396 prognostic genes were selected for prognosis analysis.

Independence test of tow-gene based signature and other clinic parameters
A two-gene prognostic model(risk score) revealed the independent prognostic capacity of MM patients in relation to other clinical parameters. It showed that our risk scoring model(HR=1.70514,P<0.001) and isotype LDH/B2M of our model have prognostic value.They are independent determinants of EFS in myeloma patients.
Identi cation prognosis of EFS with key genes and construction of two-gene prognostic signature Finally, univariate and multivariate COX proportional regression analyses were performed using the potential key genes identi ed both from hub genes in each module and the above (Fig. 7) mentioned DEGs associated with EFS in MM patients.A total of 396 key genes were identi ed,and after screening in the training set, validation set and whole set, two genes(TPX2,PRAM1) with constant and independent prognosis value were eventually selected to construct a prognostic model (Fig. 9A.Risk score = (1.48739 expression level of TPX2)+ (0.62980 expression level of PRAM1)to assess the prognostic value of these key genes in MM patients.Next, we calculated the optimal cut-off value of two-gene expression as risk score and divided patients in the training set (n = 289) into high-risk group and low-risk group. The Kaplan-Meier(K-M) curves showed that the high-risk group had a worse prognosis than the low-risk group (P < 0.0001) ,and the gene risks core heatmap was illustrated above (Fig. 9A-C). As the increase of risk score value, the expression level of TPX2 increased but PRAM1 decreased, indicating that TPX2 was a risk factor for the recurrence of MM, while PRAM1 may play a protective role.The AUCs of timedependent ROC curves were calculated to assess the prognostic capacity of the two-gene signature.

Discussion
Bone marrow micro environment contains cellular components,extracellular matrix and soluble components that form a heterogeneous networks and play an important role in the occurrence and development of myeloma.And WGCNA is an suitable tool to identify the causes and mechanism of myeloma recurrence by identifying the weighted co-expression relationships among genes and then clustering them into highly synergistic modules.Genes with strong relationships between clinic trait and modules were selected for KEGG and GO enrichment analysis, followed by TATFs-gene regulatory network.We found that three modules play different roles in the relapse of myeloma. As the study progressed, we found 19 kinds of cytokine/cytokine receptors upregulated in relapsed myeloma patients in brown module(Supplementary Table4) ,with CC13 ,CCL8, CCL23, CCL16, CCL24, IL18 thought to be regulated by TATFs(CEBPA and ELF5 ). CCL18 CCL2 CCL8 CXCL12 were related to in ltration of M2 Macrophages.
CCL18 secreted by M2 macrophages promotes tumor cell migration and invasion in gallbladder cancer [32]. CCL2 promotes macrophages-associated chemoresistance in multiple myeloma [33]. CCL8 promotes postpartum breast cancer by recruiting M2 macrophages [34]. A study showed that inhibition of M2 polarization can overcome myeloma resistance to lenalidomide by reducing CXCL12 expression [35]. However, the relationship between CCL18 and CCL8, M2 macrophages and myeloma recurrence has not been studied yet.
A large number of cytokine such as CXCL-12, IL-6, IL-8 and IL-17 have been elucidated and they construct a pro-in ammatory and permissive environment for the survival of myeloma cells [36]. Some mechanism of myeloma are nely elucidated,but the role of cytokine are still remaining unclear in the relapse of myeloma,and we provide dysregulated cytokine and their relationship with CEBPA and ELF5,suggesting that receptors for chemokine are an important point,as pharmacological inhibitors of individual receptors may have limited e cacy due to the redundancy,as well as a tendency toward compensatory increased expression of other chemokine family members [37].So it's critical to discover that key gene regulating cytokines and blocking them may simultaneously cause the turnover of a large number of chemokine,thus overcoming drug resistance and allowing cells to promote apoptosis and prevent myeloma cell metastasis. We therefore hypothesize that CEBPA and ELF5 may be the potential target for reversing cytokine.
In the CENPA-based regulatory network,we provide another possible pathogenesis of myeloma recurrence. In the yellow module,high expression of CENPA in relapsed myeloma was positively related to E2F2,suggesting that E2F2 may be associated with poor prognosis, after all, its expression in breast cancer correlated with lower recurrence-free survival [38]. BIRC5/survivin: A multitasking protein with dual roles in promoting cell proliferation and preventing apoptosis. Overexpression of AURKA lead to drug resistance in gastric cancer by up-regulating the gene expression of the anti-apoptotic protein survivin [39].
Furthermore, over-expression of survivin in the presence of anti-myeloma drugs was found to increase the viability of MM cells [40]. SKA1:knockdown of SKA1 shows a signi cant reduction in proliferation of bladder cancer cells [41]. DIAPH3:Highly expressed DIAPH3 promoted proliferation, non-anchored growth and invasion of pancreatic cancer cells [42]. Genes above are upregulated in yellow module,revealing that CENPA and other genes may play a key role in cell cycle,which lead to myeloma recurrence and promote tumor proliferation and drug resistance. The high level of genes expression in the yellow module has been rarely studied in myeloma,which provide a new direction for our research in myeloma patients.
In sky blue module:BST1 is found in relapsed multiple myeloma patients at a expression foldchange level of 4. BST1 is a analogue of CD38, which has similar functions such as hydrolysis NAD+ to adenosin (ADO). BST1 produce adenosine diphosphate ribose(ADPR) which is subsequently converted to AMP and eventually to ADO ,causing immunosuppressive functions, and is also growth factor for osteoblasts and osteoclasts. Interestingly BST1/CD157 was undetectable in myeloma cell line, suggesting that upregulated BST1 is a key cause of relapse in patients [43].The association between BST1 and drug resistance in myeloma cells have not been reported, and the high level of BST1 expression in our study suggests that it may contribute to the survival and immune escape of myeloma cells,leading to the recurrence of the disease.
Two of our screens from the hub genes were used to determine prognosis.TPX2 and PRAM1 showed independent predictive power in predicting EFS in myeloma, and both sets of genes have been rarely reported in myeloma. TPX2 is a spindle assembly factor required for mitotic spindles,and normal assembly,and its inhibition leat to a signi cant increase in mitotic index and metaphase myeloma cells.Therefore, it may be feasible to obtain inhibitors from it to suppress myeloma proliferation cells [44]. Furthermore,high levels of TPX2 indicated lower survival rate and promoted cancer cell migration and invasion in Non-Small Cell Lung Cancer [45].PRAM1 had never been reported in research of myeloma,but the high expression of PRAM1 is a sign for favorable prognosis in the CN-AML [46].This is consistent with our study that TPX2 is associated with high recurrence rates and PRAM1 gene expression is a protective factor in patients with myeloma.
The present study has several strengths and limitations. Bioinformatic tools have been used in our study to provide a general but comprehensive understanding of the mechanism of myeloma recurrence at the transcriptome level, and we provide some potential area of study worthy of in-depth study to overcome drug resistance and metastasis. A series of pivotal genes and response group pathways were identi ed to elucidate the molecular mechanisms of myeloma recurrence and to identify potential therapeutic targets.However,this study dose not profound enough and more experiments are needed to validate the function of moleculars in the regulatory networks of multiple myeloma patients.

Conclusion
In summary, we have used bio-informatics tools to elucidate possible signaling pathways and coexpressed genes regulatory network that may contribute to relapse in multiple myeloma patients. And a two-gene(TPX2,PRAM1) based prognostic model was identi ed and constructed, which can be used to predict overall event-free survival (EFS) in MM patients.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not Applicable Availability of data and material The datasets analyzed during this study are publicly available in GEO database at https://www.ncbi.nlm.nih.gov/geo/ and TCGA database at https://portal.gdc.cancer.gov/

Competing interests
The author declare that they have no con ict of interest.

Funding
There is no relevant funding for this study.
Authors' contributions HZ and CH conceived and designed the study, as well as desiged the gures and tables. HZ and YL contributed to the statistical analysis, JD,XC,HX,KL,drafted and correcting the manuscript. All authors read and approved the nal manuscript.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.