DRPPM-PATH-SURVEIOR: Plug-and-Play Survival Analysis of Pathway-level Signatures and Immune Components

Abstract Pathway-level survival analysis offers the opportunity to examine molecular pathways and immune signatures that influence patient outcomes. However, available survival analysis algorithms are limited in pathway-level function and lack a streamlined analytical process. Here we present a comprehensive pathway-level survival analysis suite, DRPPM-PATH-SURVEIOR, which includes a Shiny user interface with extensive features for systematic exploration of pathways and covariates in a Cox proportional-hazard model. Moreover, our framework offers an integrative strategy for performing Hazard Ratio ranked Gene Set Enrichment Analysis (GSEA) and pathway clustering. As an example, we applied our tool in a combined cohort of melanoma patients treated with checkpoint inhibition (ICI) and identified several immune populations and biomarkers predictive of ICI efficacy. We also analyzed gene expression data of pediatric acute myeloid leukemia (AML) and performed an inverse association of drug targets with the patientâ€™s clinical endpoint. Our analysis derived several drug targets in high-risk KMT2A-fusion-positive patients, which were then validated in AML cell lines in the Genomics of Drug Sensitivity database. Altogether, the tool offers a comprehensive suite for pathway-level survival analysis and a user interface for exploring drug targets, molecular features, and immune populations at different resolutions.


Background
Organizing biological knowledge into pathways facilitates the integrative analysis of gene expression data derived from RNA sequencing and proteomics pro ling. Common pathway-level analysis tools, such as ENRICHR [1], GSEA [2], are able to perform pathway enrichment analysis based on gene set databases (e.g., KEGG [3], REACTOME [4], MSIGDB [5], LINCS1000 [6], and the Cell Marker database [7]). While these pathway analysis tools tend to focus on differentially expressed genes between two groups of samples, an alternative approach is to infer a pathway activity score in a single sample by transforming the expression of a set of genes into a single value using gene summary statistics, such as maxmean statistics [8], PLAGE [9], GSVA [10], and ssGSEA [2,11]. Based on this approach, scores derived from custom gene sets or from network analyses [12][13][14] can then be used to dichotomize the patient population for survival analysis. For example, PERK-associated gene activity was found to be associated with a higher risk in melanoma patients [15], RAS dependency index was developed in pancreatic adenocarcinoma [16], LCK network activity was associated with T-cell acute lymphoblastic leukemia patient survival [17], and an epithelial-mesenchymal transition score was found to be associated with poorer disease-free survival in ovarian and colorectal patients [18]. Moreover, single scores derived from immune markers can be used as an estimate of immune components (e.g., Xcell [19], TIMER 2.0 [20], and geometric mean estimation of tumor in ltrative leukocytes [21]). These immune scores can then be applied in cancer patient classi cation [22], as a biomarker of check-point immunotherapy response [23], or as a prognosis marker that's predictive of clinical outcome [24].
These integrative summary scores represent a useful approach in highlighting signaling pathways and immune populations that correlate with the clinical outcome. But existing survival analysis tools either lack a user-friendly interface or has limited functionality for systematic screening of large pathway databases. They are often restricted in available patient cohort or limited to a small subset of pathways [25][26][27] and are often incapable of accepting external user input data [25][26][27][28] or clinically relevant covariates [26,29,30]. Thus, to facilitate the public mining of retrospective clinical studies, we introduce DRPPM-PATH-SURVEIOR, a comprehensive plug-and-play suite for pathway-level survival analysis of signature databases. Our tool is presented with the following unique features: 1. A one-stop tool for expression-based survival analyses.
2. The ability to include multiple covariates inside the Cox-proportion hazard pathway model. 3. The ability to summarize prioritized gene signatures into relevant clusters and pathway modules. 4. The ability to perform hazard ratio ranked gene set enrichment analysis Altogether, survival analysis is a critical branch of statistics for analyzing the time-to-event, and our tool facilitates a comprehensive survival analysis of pathway-level scores (an additional comparison of features is available in the Supplementary Result Section).

Implementation
Overview of the entire pipeline. DRPPM-PATH-SURVEIOR is implemented in the R environment, and packages will be automatically installed during runtime. There are four major components of the DRPPM-PATH-SURVEIOR ( Fig. 1 "On-the-y" Mode: Shiny Interface DRPPM-PATH-SURVEIOR is a comprehensive framework for analyzing and visualizing "on-the-y" associations of immune signatures and pathways scores with a clinical endpoint. The application facilitates the partitioning of patients based on pathway scores, estimated immune cells, and gene expression level, followed by univariate Cox-regression survival analysis and multivariate Cox-regression analysis. DRPPM-PATH-SURVEIOR uses several R packages, including survival (v3.2-11), survminer (v0.4.9), GSVA (v1.40.1) [10], and immunedeconv [31](v2.1.0). Pathway score is calculated with the gsva() function based on ssGSEA, GSVA, plage, or zscore. Immune deconvolution is performed with the immunedeconv R package, which includes several deconvolution packages, such as CIBERSORT [32], ESTIMATE [33], and MCP counter [34]. For multivariate analysis, a covariate can be selected from the user-provided meta-information le. The multivariate survival analysis can be performed through additive and multiplicative interaction of two or more variables.
To evaluate the association between pathway and survival S over time t, S(t), is de ned through hazard function, h(t), as where h(t) is a hazard rate at time t and h 0 (t) is the baseline hazard rate, β 1 is the coe cient related to hazard with β 1 > 0 as high risk and β 1 < 0 as low risk for X 1 , a dichotomized based on the gene or pathway score.
To evaluate the pathway association with survival after adjusting for patient meta information X 2 is de ned as To evaluate the associated interaction between the pathway and patient meta information is de ned by To facilitate the identi cation of top high-risk pathways and genes, we have developed a pipeline to systematically assess pathways associated with hazard by a Cox proportional hazard function. The user can provide or select individual genes and pathway databases to perform a systematic screening. Each expression feature is strati ed based on a median cutoff. The user also has the option of performing a systematic screening with the inclusion of a covariate as an additive or multiplicative interactive model. The p.value is calculated on the likelihood ratio, wald test. An adjusted p.values can be calculated based on Benjamini-Hochberg correction method. In the output table, genes and pathways are ranked by the likelihood ratio p.value.

Connectivity Mode: Pathway Gene-set Connectivity
The Connectivity Mode offers the user the ability to analyze the similarity between pathways associated with survival. The hazard ratio ranked pathways from the pipeline mode can be used as input to the DRPPM-Jaccard-Connectivity R Shiny app to generate hierarchical clustering based on gene-set similarity. A Jaccard function can calculate distance between pathways: The Jaccard score function J for two pathways A and B is de ned as

Results
To demonstrate functionalities of the DRPPM-PATH-SURVEIOR framework, we have included use-case examples of biomarker discovery in a cohort of immunotherapy-treated melanoma patients. We have also provided an example use-case strategy for drug repurposing in pediatric acute myeloid leukemia patients.

Identifying Immune Pathways Associated with Effective Checkpoint Inhibition Treatment
To identify predictive biomarkers that facilitate patient selection of patients suitable for immune  Table S1). These include PRF1 and HLA-DPA1, which have been previously reported as predictive biomarkers for ICI therapy [40] (Fig. 2). "On-the-y analysis mode" further demonstrate PRF1 and HLA-DPA1 were signi cantly higher in patients who respond to ICI treatment (Supplementary Figure   S1). We then ranked the genes based on hazard ratio derived from the cox-proportion hazard model and performed a Hazard Ratio Ranked GSEA analysis of the Hallmark database (Fig. 3A). Interferon Gamma was found associated with Low-risk patients (Fig. 3B). Consistently, immune signatures associated with LCK and CSF1 were also associated with Low-risk patients (Fig. 3C, 3D). Through immune deconvolution, we derived an immune score from xCell [19] and an estimated M2-like macrophage population from Cibersort [32]. We found that high immune in ltration with low M2-like (immune suppressive) macrophages were associated with better outcome (Fig. 4A, 4B). Next, we used the "Pipeline Mode" to perform a systematic univariate Cox-hazard analysis of immune signatures and identi ed to identify 69 immune signatures associated with a better outcome (Supplementary Table S2). A systematic assessment of immune pathway followed by Pathway Connectivity analysis demonstrated 13 immune modules captured a favorable outcome in pretreated RNA samples, including CD8 cytotoxicity, antigen presentation, interferon and immune checkpoint marker signatures (Fig. 5, Supplementary Figure S2). Altogether, our tool provides suggests favorable outcome and is linked with immune activation.

Survival-Directed Therapeutic Discovery in Acute Myeloid Leukemia
To leverage our framework for therapeutic discovery, we obtained the gene expression data and clinical annotation of 220 patients with the KMT2A fusion event from the National Cancer Institute TARGET pediatric acute myeloid leukemia (AML) 1031 cohort (0-22 years of age). The translocation event of the gene KMT2A, also known as mixed lineage leukemia (MLL), is frequently identi ed in pediatric AML. Through its multiple fusion partners arises a diverse patient population with a need for advanced risk strati cation [43]. Through the DRPPM-PATH-SURVEIOR suite of tools, we examined pathways and genes associated with poor outcome and identify therapeutic targets in high-risk patients. First, single samples gene set enrichment analysis (ssGSEA) was performed using the expression data in tandem with the Library of Integrated Network-based Cellular Signatures (LINCS; 31,028 gene sets) LINCS1000 gene sets to calculate the pathway scores (Supplementary Figure S3). Next, the patients were dichotomized through a median cut-point of each pathway score into an above-median or below median group, followed by a Cox proportional hazards regression using the patient's overall survival (OS) and event free survival (EFS). A hazard ratio value greater than one and a P-value less than 0.05 was applied to identify signi cant pathways associated with high risk. To prioritize a putative therapeutic target that downregulates genes associated with high-risk AML in the KMT2A subgroup, we examined enriched connectivity map (cMAP) name and Mechanism of Action. We found 12 enriched Cmap names and 6 enriched drug categories grouped by their mechanism of action (Fig. 6A). Notably, genes downregulated by the HDAC inhibitor, Vorinostat, were associated with the worst prognosis based on OS and EFS (Fig. 6B). Furthermore, Vorinostat was highly sensitive in KMT2A fusion-positive AML cell lines based on the genomics of drug sensitivity in cancer (GDSC) database (Fig. 6C). Taken together, we presented an integrative strategy utilizing DRPPM-PATH-SURVEIOR to prioritize pathways based on patient risk and identi ed a known therapeutic target in high-risk KMT2A fusion-positive AML patients.

Conclusion
DRPPM-PATH-SURVEIOR is designed to visualize and perform systematic survival analysis based on gene pathway information. The application is designed for users with limited experience in programming as well as for advanced users to perform systematic high-throughput pathway screening. In the interactive mode, the Shiny application will ensure reproducibility and can be easily set up and applied in any cohort. In the pipeline mode, the user can apply univariate and multivariate analysis of pathway and patient covariates associated with patient outlook. Our current application can also perform GSEA based on hazard ratio ranking as well as pathway clustering to examine shared gene and pathway features associated with survival. As more RNA sequencing and proteomics data are being captured in large clinical trials, we anticipate DRPPM-PATH-SURVEIOR will enable a collaborative environment for exploring pathway-level and immune features that is predictive of treatment e cacy, especially for immunotherapy.

List Of Abbreviations
Acute    Pathway connectivity Analysis User Interface. 1) Input list of pathway that satisfy the selection criterion.
2) Set the number of clusters captured by the algorithm. 3) Select the Clustering Visualization Tab.

Figure 6
Overall and event free survival curves of KMT2A positive patients in TARGET AML 1031. A) After performing the LINC1000 screening of drug targets in AML, the "High Risk" gene sets were further prioritized by counting the number of drug target gene sets, which was further categorized based on mechanism of action (MOA) to identify enriched drug targets in high-risk patients.