Comprehensive Analysis of Eph-Ephrin as Novel DLBC Biomarkers for Molecular Subtyping and the Predictability in Prognosis and Drug Response

Background: Receptor tyrosine kinases (RTKs) are key signal molecules for sustaining proliferative signaling and abnormal of RTKs appears in many cancers, including Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC). Methods: Cluster analysis based on RNA-seq and RPPA data were calculated to establish novel signature for molecular subtyping of DLBC. Principal component analysis (PCA) was used to evaluation relationship of selected genes. Results: EPHB4, EPHB6, EFNA3, EFNA4, EFNB1 are 5 specic genes in DLBC and their expression level are signicantly relevant to poor prognosis. Integration analysis of DLBC Expr-Ab signature discovered ve DLBC related Eph-Ephrin genes may be involved in epigenetic regulation in DLBC progress. Four novel clusters based on Expr-Ab are generated and we rst link ve Eph-Ephrin to well-dened oncogenes as following: EPHB4-BCL6; EFNA3/EFNA4-MYC; EPHB6-EZH2; EFNB1-Epigenetic modulators. Drug response data involving 13 traditional and targeting drugs Conclusions: Expr-Ab signature we established in this study indicates the power of both RNA-seq and RPPA data in developing and evaluating precision regimens. We also highlight Eph-Ephrin, as surface proteins, are powerful potentially biomarkers. Our nding underlines Eph-Ephrin as biomarkers for predicting prognosis and precision regimen for patients with lymphoma.

all aspects of cancer treatment, one of which is precision individual regiments. Genetic testing for therapy with targeting drugs, known as Companion Diagnostics (CDs), based on sanger sequencing, next generation sequencing (NGS) and quantitative PCR, are one applied area in clinical. While, genetic testing can just provide information of individual genetic variation, instead of predicting drug response in patients. Hence, valuable information with clinical impacts should be further mined. Lots of international projects aiming at precision cancer medicine have been proposed and nished in the past ten years. The projects include NGS based TCGA[6] (The Cancer Genome Atlas), cell line based CCEL [7] (Cancer Cell Lines Encyclopedia), cell viability measure based GDSC[8] (Genomics of Drug Sensitivity in Cancer), RPPA (Revers Phase Protein microArray) based TCPA [9] (The Cancer Proteome Atlas) and so on.
Lymphoma and leukemia are aggressive blood cancer involving all age groups. One subtype of lymphoma, diffused large B cell lymphoma (DLBCL), is the largest one in Non-Hodgkin Lymphoma (NHL) and has poor prognosis for nearly 40% of patients [10]. A high proportion of patients with poor prognosis will suffer central neurol system (CNS) involvement. Chemotherapeutic drugs can't reach lymphoma cells at effective concentration because of the Blood-Brain-Barrier (BBB), which is one major reason that patients can't bene t from chemotherapy. Although abnormal genetic variation caused by deletion, mutation, ampli cation involving B cell receptor signaling pathway, NF-κB signaling pathway are discovered by whole exons sequencing and other technologies, biomarkers for predicting prognosis and chemoresistance are still extremely de cient [11].
In this study, we identi ed major Eph-Ephrin genes in Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC). DLBC is de ned by TCGA, which contains DLBCL, Burkitt lymphoma and others.
Next, we integrate RNA-seq data of Eph-Ephrin and DLBCL signature genes and RPPA data of DLBCL signature proteins (termed as 'Expr-Ab') to discover Eph-Ephrin pattern in DLBC and predict their possible function by clustering characteristics. Last, we explore the predictability of Expr-Ab on drug action based on available GDSC data, from which lots of Drug-Expr-Ab interaction can be further mined. We expect integrative strategies as Drug-Expr-Ab and beyond can be fully exploited in various cancer omics data to meet demand for diagnosis and therapeutics.
All RNA-seq data and RPPA data were normalized to weed out the differences. Clustered heatmap and the correlation coe cients for were calculated and produced in R studio. Principal component analysis (PCA) was performed using the FactoMineR package in R studio.

Results
EFNBA3, EFNA4, EFNB1, EPHB4, EPHB6 are identi ed as major expressed genes in DLBC To evaluate the expression pattern of Eph-Ephrin, we rst analyse RNA-seq data of all 22 Eph-Ephrin members in 130 leukemia and lymphoma cell lines from the CCLE database. We nd EPHB4, EFNBA3, EFNA4, EFNB1 are genes high expressed in most cell lines. EPHB1, EPHA1 and others are found highly expressed in speci c cell lines ( Supplementary Fig. 1A). Moreover, EPHB4, EPHB6, EPHA1 are highly expressed in ALL/AML cell lines, while EPHB1 are highly expressed in a small group of uncertain classi ed cell lines. Then, we performed the same analysis in 39 DLBC cell lines. EFNBA3, EFNA4, EFNB1, EPHB4, EPHB6 are identi ed as major expressed genes in DLBC (Fig. 1A). Further, we select 10 widely used cell lines for expression pattern and PCA analysis, in which NAMALWA, SU-DHL-4, SU-DHL-6, SU-DHL-10, RAJI, DAUDI are classi ed in DLBC and THP-1, U-937, JURKAT, HL60 are classi ed in non-DLBC. We found EFNB1/EPHB6 and EFNBA3/EFNA4/EPHB4 are two signi cant differential expressed groups.
The EFNBA3/EFNA4/EPHB4 group tends to be high expressed in DLBC cell lines, while the EFNB1/EPHB6/EPHA1 group are high expressed in non-DLBC cell lines (Fig. 1B). Further PCA analysis shows that EFNBA3/EFNA4/EPHB4 and EFNB1/EPHB6/EPHA1 are identi ed as two independent gene groups based on expression pattern.
Next, to determine the predictability of Eph-Ephrin members in prognosis, we analyzed microarray expression data of Lenz Staudt Lymphoma (GSE10846), which contains detailed available clinical data of 420 DLBC patients. Hazard Ratio and p value of 22 Eph-Ephrin members show most Eph-Ephrin members are signi cantly relative to patients' prognosis ( Supplementary Fig. 1B). Further, we focus on EFNBA3, EFNA4, EFNB1, EPHB4, EPHB6 according to their high expression in DLBC. Further, the combination of the 5 Eph-Ephrin genes (termed as 'combined-5') into over survival analysis shows more than 10 fold change in p value, which means the combined-5 has better performance in risk grouping than anyone of the 5 genes ( Fig. 1D-E). From expression data we found low expression of EFNA3/EFNA4 are enriched in high-risk group with poor prognosis, in which EFNB1/EPHB4/EPHB6 are higher expression than low-risk group (Fig. 1F). These results indicate the expression data of EFNBA3/EFNA4/EFNB1/EPHB4/EPHB6 at mRNA level has the potential as diagnostic biomarkers to predict DLBC patients' prognosis.
Integrative analysis of Eph-Ephrin and DLBC abnormal pathway by mining multi-omics data To predict functions of Eph-Ephrin genes in lymphomas, we integrate two types of expression data involving DLBC major abnormal genes/pathway. One is mRNA expression data obtained by RNA-seq, and the other is protein expression data obtained by RPPA. By integrating expression data of Eph-Ephrin genes, we expect to predict which DLBC pathway the Eph-Ephrin genes are involved in. Firstly, we analyze RNA-seq data of one DLBC signature gene-set and the ve blood-speci c Eph-Ephrin genes, involving 22 genes (termed as Eph-Ephrin gene network). Expression pattern of Eph-Ephrin gene network generates 4 clusters as following: cluster 1 is TP53/CDKN1A, cluster 2 is Eph-Ephrin involving groups in DLBC, cluster 3 is BCL6/CARD11/CD79A/CD79B, cluster 4 is BCL2/CD80/TNFAIP3 ( Fig. 2A) To con rm whether analysis results produced by RNA-seq data will be in line with protein data by RPPA/antibody (term as 'Ab'). We further screen and set one DBLC signature by Ab panel, which consists of 10 Ab panel ltered from 47 Ab. First, we select 47 Ab that involving major DLBC pathway which are ErbB signaling pathway, B cell receptor signaling pathway and PI3K-Akt signaling pathway (Supplementary Table 1). Second, RPPA data of 47 Ab in blood cancer are analyzed by clustering, correlation coe cient, PCA (Supplementary Fig. 3A-B). Lastly, 10 independent Ab are selected as Ab panel of DBLC signature, analysis is performed both in blood cancer ( Supplementary Fig. 3C-E) and DLBC ( Fig. 3A-C). Furthermore, another RPPA data of 33 patients from the TCPA project is analyzed using the same 10 Ab data. We got similar correlation coe cient and PCA results generated from protein/RPPA data ( Fig. 3D-F). Hence, the 10 DBLC signature of protein/RPPA data are well-established.
Next, we further integrate RNA-seq and RPPA data (term as 'Expr-Ab') to analyse Eph-Ephrin pattern, attempting to predict Eph-Ephrin function more accurately and reliably. Similar to the Eph-Ephrin gene network signature, 4 novel clusters based on Expr-Ab are generated as follows: Ab data PKCα and its phosphorylation at S657 are clustered into the BCL6/CARD11/CD79A/CD79B group; BCL2/CD80/TNFAIP3 in Eph-Ephrin gene network signature are divided into two groups in Expr-Ab signature. One is CD80/TNFAIP3 with Ab data phosphorylation of B-Raf at S445 and PI3K-p85. The other is a novel group consisting of Ab data EGFR/HER2/Bcl-2 and BCL2's RNA-seq data. The last group is made up of the Eph-Ephrin genes, DNA Damage Response (DDR) genes (ATM/TP53/CDKN1A), Epigenetic modulator genes and others (Fig. 4A). 4 signatures on the basis of the expression pattern are characterized. Low expression of cluster 1 and high expression of cluster 4 are characteristic of signature 1. High expression of cluster 3 are characteristic of signature 2. Medium expression of cluster 1 are characteristic of signature 3. Medium expression of cluster 2/3 are characteristic of signature 4. Next, we analyse the expression pattern of Eph-Ephrin genes in DLBC Expr-Ab signature. We found Eph-Ephrin genes closely correlated with epigenetic modulators which are highly mutated genes in DLBC uncovered by whole-exon sequencing studies (Fig. 4B-C).

Drug Response Prediction by Expr-Ab signature involving Eph-Ephrin.
To evaluate prospects of Expr-Ab signature in clinical, drug response data from GDSC project are integrated to Expr-Ab signature analysis. Available Expr-Ab data and intact drug response data of 13 traditional/targeting drugs in 12 lymphoma cell lines are retrieved into analysis. The results show drug response pattern can be, to some extent, clustered and 5 speci c Expr-Ab clusters are generated termed as 'PI3K pathway', 'BCL6', 'MYC', 'Epigenetic Modulators', 'BCL2-HER2' (Fig. 5A). The most remarkable Expr-Ab signature involves in high expression level of cluster 'BCL6', 'MYC', 'Epigenetic Modulators' in cell lines A3KAW and A4FUK, which are sensitive to most drugs. The most multiple resistant phenotype are enriched in high expression level of cluster 'BCL2-HER2' in cell lines GRANTA519 and KARPAS422.
For Eph-Ephrin genes, EPHB4 is clustering with BCL6, one feature gene in a subtype of DLBC; EFNA3/EFNA4 is clustering with MYC, another feature gene in a subtype of DLBC; EPHB6 is clustering with EZH2, which is a recent uncovered feature gene and a druggable drug targets; EFNB1 is clustering with epigenetic modulators including KMT2C/KMT2D/EP300/CREBBP, which underline the intrinsic connection of EFNB1 and epigenetic modulators. Targeting EFNB1 may be a shortcut to develop therapeutics treating epigenetic abnormal cancer.

Discussion
Before 2010, microarray was the main technology for high-throughput gene expression analysis [12]. The relative expression level of genes can be measured by hybridization of the probe on the chip with the nucleic acid of the sample. However, microarray technology can't measure the absolute value of gene expression. Therefore, the importance of genes with low or no expression may be overestimated, while the importance of genes with high expression may be ignored. After 2010, with the rapid development of next generation sequencing (NGS) technology, the cost of sequencing has been further reduced, and a large number of international systematic functional genome research projects have been implemented [13]. CCLE has carried out multi-omics analysis of more than 1000 cell lines involving dozens of cancer types established in the laboratory, which has improved the important resources for the study of gene function.
Eph-Ephrin contains 22 receptor-ligand family members. The Eph-Ephrin genes may have cell lineage speci city in the process of evolution, but Eph-Ephrin signaling pathway should be indispensable to all cell lineages. Therefore, each cell lineage must express several Eph-Ephrin genes. Although many members of Eph-Ephrin signaling pathway have been reported abnormal expressed in many cancers, systematic analysis of Eph-Ephrin in cancer and relationship of prognosis still lack. Here, we comprehensively analyse the expression pattern of Eph-Ephrin in DLBC on the basis of the CCLE database. Some of Eph-Ephrin genes, instead of all, must be indispensable for DLBC and gene expression is the basis of gene function. Hence, to nd out key Eph-Ephrin genes in blood cancer or DLBC, we analyse RNA-seq data from CCLE. We suppose EPHB4 /EPHB6 are receptors and EFNA3/EFNA4/EFNB1 are ligands for most blood cancer or DLBC cell lines. Overall survival analysis indicates the ve Eph-Ephrin genes possess well prognostic predictability in DLBC. More comprehensive and deep analysis should be conducted on other types of blood cancer.
Integration of Eph-Ephrin genes with 17 DLBC signature genes generates 4 clusters as showed in Fig. 2 and all of the ve Eph-Ephrin genes are clustered in one group. Destruction of B cell signaling pathway, such as low or no expression of CD79a and CD79b, and abnormal high expression of various transcription regulators, epigenetic modulators, Eph-Ephrin genes are the main feature of signature 1 cells. Similar results also appear in the Expr-Ab analysis in Fig. 4. It's a negative correlation between the expression of Eph-Ephrin and CD79a/CD79b in signature 1 cells, which indicates potential interaction between them according to the same subcellular localization.
Compared to signature 3, the feature of signature 2 in Eph-Ephrin gene network is high expression of cluster 4 (BCL2/CD80/TNFAIP3).
In the Expr-Ab analysis, the integration of expression data of RNA and protein can merge the advantages of RNA-seq and RPPA, to obtain more accurate classi cation of DLBC according to gene expression pro le. The feature of signature 1 in the Expr-Ab is similar to signature 1 in Eph-Ephrin gene network. The feature of signature 2 in the Expr-Ab is high expression of cluster 3 (BCL2/EGFR/HER2), in which BCL2 is an anti-apoptosis gene and EGFR/HER2 are pro-proliferation genes. The feature of signature 3 and signature 4 in the Expr-Ab is expression pattern of cluster 1 (CARD11/CD79A/CD79B/BCL6/PKC-α).
BCL6/PKC-α are high expression in signature 3 and low expression in signature 4. Hence, the Expr-Ab pattern is useful to classi cation of DLBC to the signature 1-4. For example, samples with high PKC-α (cluster 1) can be classi ed to the signature 3. Samples with high BCL2 (cluster 3) can be classi ed to the signature 2. Samples with high MYC (cluster 4) can be classi ed to the signature 1. Samples with low PKC-α/ BCL2/ MYC can be classi ed to the signature 4.
After combining the drug response data from GDSC with the integrated Expr-Ab expression data, we found that similar drug response results can be clustered together based on the Expr-Ab. For example, A3KAW and A4FUK, which have similar expression pattern as the signature 1 of the Expr-Ab, have similar sensitive pattern to most of the 13 drugs. GRANTA519 and KARPAS422, which have similar expression pattern as the signature 2 of the Expr-Ab, have similar resistant pattern to most of the 13 drugs.These data show that the Expr-Ab expression data can re ect and predict the results of drug response. More detailed multi-omics expression data should be further integrated and analyzed to select a better gene set for predicting drug response.
In addition, it is surprising and interesting to link Eph-Ephrin genes to transcriptional regulators and epigenetic modulators. The Eph-Ephrin possess activity of kinase and may directly or indirectly regulate expression and active status of downstream targets through phosphorylation. Downstream targets of EPHB4, EPHB6, EFNA3, EFNA4, EFNB1 in DLBC should be identi ed in the future.

Conclusions
In this study, we comprehensive analysis 22-gene expression pattern of Eph-Ephrin in leukemia and lymphoma from CCLE RNA-seq data and select ve DLBC high-expression Eph-Ephrin genes, receptors EPHB4, EPHB6 and ligands EFNA3, EFNA4, EFNB1. Microarray expression data of Lenz Staudt Lymphoma GSE10846, including detailed available clinical data of 420 DLBC patients, are analysis to evaluation the performance of Eph-Ephrin genes in prognosis prediction. Overall survival data indicates expression level of the 5 Eph-Ephrin genes are signi cant related to risk grouping of patients.
Drug response prediction involving 13 traditional and targeting drugs by our Expr-Ab signature indicates the power of both RNA-seq and RPPA data in developing and evaluating precision regimens in clinical. At the same time, we highlight Eph-Ephrin are powerful potentially biomarkers to represent and replace other types of variations as following: EPHB4-BCL6; EFNA3/EFNA4-MYC; EPHB6-EZH2; EFNB1-Epigenetic modulators. Eph-Ephrin genes, other than non-cell surface proteins, have better potential to develop into molecular diagnostic targets. Our nding underlines Eph-Ephrin as biomarkers for predicting prognosis and precision regimen for patients with lymphoma.

Declarations
Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of data and materials All raw data are available from public database. R scripts used in this study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests calculated from high risk group and low risk group of GSE10846 determined by the SurvExpress optimized algorithm on the basis of individual or combined Eph-Ephrin expression. Combined "Gene-5" are EFNBA3, EFNA4, EFNB1, EPHB4, EPHB6. (E) Kaplan-Meier curves of high risk and low risk population on the basis of combined "Gene-5" expression level. Risking grouping was conducted through an optimization algorithm. (*, p < 0.05; **, p < 0.01). (F) The expression level of EFNBA3, EFNA4, EFNB1, EPHB4, EPHB6 in low risk group and high risk group.     Analysis of the predictability between Expr-Ab signature and drug response. (A) Expr-Ab signature and paired drug response data in 12 blood cancer cell lines. The unit of drugs' IC50 is μM.