A Gene Modular Approach to a Temporal Meningococcal Septic Shock Transcriptome, Enabling Clinical Phenotype Genotype Association

Meningococcal Septic Shock (MSS) is a life-threatening condition, especially in Infants. Acutely, critical care support to physiologically impact disease progression is an imperative. Exploring the clinical relationship according to temporal microarray data could aid in tracking MSS pathogenesis during this critical period. For the rst time, clinical-phenotype gene-expression association studies using Weighted Gene Co Expression Network analysis (WGCNA) was completed on 5 children admitted to the Paediatric Intensive Care Unit (PICU). Then using WGCNA, 18 gene modular clusters were identied. A gene list was generated for each module that was signicantly associated with clinical traits. Each gene list underwent gene enrichment analysis. The modules associated with Pediatric Logistic Organ Dysfunction (PELOD) score at 0, 24, and 48 hours showed a changing pattern of enriched pathways related to nuclear (p < 2e-08), cytoplasmic (p < 4e-05) and then extracellular gene regulation (p < 7e-17) for PELOD 0, 24 and 48 hours, respectively. Though WGCNA has been applied to sepsis datasets by other researchers, this study demonstrates association of important temporal clinical characteristics, i.e. PELOD score, with changing cellular processes. This study is an important step in design a gene modular approach with pathophysiological insights, potentially aiding future clinical research design and therapeutic approaches.


Introduction
In North America, prevalence of Meningococcal Septic Shock (MSS) has been declining, with only 1 case per million in 2018. Despite this low prevalence, the rapidity of disease progression and the case fatality rate (CFR) from invasive meningococcal disease, leading to MSS, remains signi cant. MSS in the absence of meningitis has a worse CFR, ranging from 16 to 52% [1]. Infants may also have a genetic predisposition, for example human toll-like receptor 4 mutations were seen to be associated with IMD in infants below 12 months of age [2]. Further, the burden of meningococcal disease remains highest in young infants, with serogroup B predominating [3]. In MSS, the focus of the treating doctor is to provide critical care support to physiologically impact disease progression, especially in the rst 48 hours.
Exploring the clinical relationship according to temporal microarray data, could aid in tracking MSS pathogenesis during this critical period.
Studying the longitudinal transcriptome in children [4] and adults [5] [6], researchers have shown the utility of gene-expression data in sepsis prognostication. Wong undertook microarray single-point analysis of a pediatric sepsis cohort [7] using Endotyping to categorize 3 subclasses of patients with sepsis (Endotype A, B and C). An endotype is a subgroup de ned by pathobiological mechanisms. Herein Wong et al showed 100 genes to be useful in classifying Endotypes A and B in children with septic shock [8]. Wong et al. concluded that the allocation of the clinical phenotype to subclass A was associated with a poor outcome. Further Wong et al noted the concept of endotype-switching, from one sub-group to another, during the sepsis illness.
Time series datasets could have utility in documenting sepsis evolution. Importantly, the intra-patient interdependency of time-sequence gene expression data could be exploited from a methodological perspective. One option is to assume that time-series gene expression data-sets form a part of a interconnected geometric clustering network. Of the various network analysis approaches available, Weighted Gene Co-expression Network Analysis (WGCNA) is one such approach, utilizing an understanding of topology. Unlike other clustering techniques, WGCNA is based on biological signi cance clustering criteria and not on geometric distance. The Langfelder et al approach uses a 'R'-based WGCNA software package [9]. A further advantage of WGCNA is the ability to phenotypically stratify modules based on clinical parameters, such as weight and age, thereby useful in studying gene-trait relationships [10]. Huang et al used such a modular approach to demonstrate a relationship between hub genes and long non-coding RNAs (lncRNAs) in a time series murine model of sepsis [11]. Similarly, Cheng et al developed a WGCNA work ow which showed lncRNAs as regulators in intensive care patients with sepsis [12]. Also using WGCNA, Chao et al identi ed key genes (MMP9 and C3AR1), both associated with sepsis prognosis and the pattern of cellular in ltration [13]. Xiaojie et al developed a 10-core gene panel for the diagnosis of sepsis derived from a WGCNA work ow [14]. Through a secondary analysis Yiping Li et al [15] related four modules to Wong et al pediatric sepsis dataset [4], identifying key hub genes. From this work, Yiping then validated the hub genes by qPCR, evidencing the transcription of the hub genes in a prospective case-control validation study. Thereby suggesting that the hub genes have the potential use as biomarkers in pediatric sepsis.
Accordingly, a topological modular approach using WGCNA, is proposed to be applied in this paper to a dataset of children with meningococcal septic shock (MSS) [

RNA extraction
The dataset from the this secondary analysis was available from ArrayExpress dataset (E-MEXP-3850).
Microarray data analysis And Weighted Gene Co-expression Network Analysis (WGCNA) The expression data set contained a total of 30 samples from 5 patients at 6 different time points. Patient 4 at the 24 hour time point had no expression values and so was removed from further analysis, reducing the total samples to 29. 33,297 probe sets from 29 Human Gene 1.0 ST Arrays were generated and then compared. Using R software, the 29 Microarray gene expression sample dataset underwent WGCNA ( Figure 1). First a gene co-expression network was constructed after calculation of the Pearson correlations between pairs of genes across all samples. Next, modules were identi ed using a hierarchical clustering dendrogram and dynamic tree cut methodology. Densely interconnected gene clusters were represented by modules, according to a soft thresholding power β. A soft-thresholding power of 6 was chosen. It is the lowest power for which the scale-free topology t index curve attens (0.68). Clustering dendrogram was generated assigning colors to the modules. This led to the identi cation of 19 modules, labeled 0-18, with the number of genes associated with each gene cluster. The label 0 was reserved for genes outside of all modules.

Module detection
Clustering was also performed based on the module color and clinical traits of time, age, gender, mortality and weight. Subsequently modules were related to phenotypic data based upon clinical variables. Each given module generated a rst principal component, termed the Module Eigengene (ME). Clinical trait data was then correlated against the ME giving a correlation coe cient. Genes from the signi cant modules showing high module membership (MM) were ltered and selected (p.MM ≤ 0.05).

WGCNA Construction and Detection of Disease Associated Modules
A quantitative measure of MM was de ned for each module, as the correlation of the ME with the gene expression pro le. Modules were related to phenotypic characteristics, such as weight, age, mortality, organ dysfunction (based on the PELOD score). An adjacency matrix was assembled, with rows corresponding to ME's and columns to clinical traits ( Figure 2). Genes from the signi cant modules showing high module membership were ltered and selected (probability of module membership was ≤ 0.05). Eigengenes were formulated for each module (Module Eigenes) and then correlated to phenotypic characteristics (external trait) data. Each association was color coded by the correlation value.
Gene Enrichment WGCNA analysis generated gene lists showing signi cant module membership. These gene lists then underwent pathway enrichment studies. The Fisher exact test was then applied to the gene list. Using inhouse R script, pathways were generated using Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotation with the associated Gene Ontology (GO) terms. The subsequent enriched gene list was then imported into Cytoscape [17] and annotated using the enrichment map tool within the Cytoscape platform ( Figure 3). The Kyoto Encyclopedia of Genes and Genomes (KEGG) database provided an interpretation of the enriched gene pathways. This enriched data was then passed into the enrichment map software in Cytoscape using a p value (0.001) and a FDR (0.01) threshold, to illustrate the enriched pathways. Further, the enriched gene list, using R script, ltered using a p value (0.001) and an FDR (0.01) threshold, generated enrichment dot plots ( Figure 2). Dot plots for PELOD 0, 24 and 48 time categories were generated using the top 25 signi cant (p <0.01) pathways, for ease of illustration ( Figure   4). Dot plots for PELOD 0, 24 and 48 time categories were generated using the top 25 signi cant (p value <0.01) pathways, for ease of illustration ( Figure 4). The pathways gendered according to the dot plots pertained to GO terms and terms from the Reactome database. The WGCNA generated gene lists were also enriched by parsing through a gene pro ling platform, g:Pro ler. The signi cantly upregulated genes (p <0.05) according to the adjacency matrix trait underwent functional enrichment analysis using g:Pro ler. To reduce the chance of false positives a p <0.05 for statistical signi cance and the Benjamini-Hochberg FDR (False Discovery Rate) were used. As detailed ( Figure 5), g:Pro ler uses a number of client libraries to interpret gene lists from a functional enrichment point of view.

Patient Demographics
All Infants demonstrated clinical phenotype consistent with severe shock and diffuse intravascular coagulation consistent with Meningococcal sepsis (Table 1)

Functional enrichment analysis:
A graphical representation functional enrichment analysis using g:Pro le software was undertaken. Data was parsed through the g:Pro le platform from the WGCNA generated gene modules for selected clinical traits. Generated Manhattan plots according to PELOD 0 hours, 24 hours and 48 hours are shown ( Figure  5).  (Figure 4). In addition, the OR at PELOD 24 hours showed. Pathways present at PELOD 0 but not at the other time points includes GO pathways related to the mitotic cycle and the golgi sub-compartment. Pathways present at 24 hours and not at PELOD 0 or 48 hours included GO pathways related to the cytoplasmic vesicle membrane, endoscope and import function into the cell. With regards to cytokine signaling no pathways were seen at PELOD 0 hours, but GO pathways were present at 24 and 48 hours.   [20] showing infants and children being most similar, whereas neonates and adults were most dissimilar. Raymond et al showed that neonates had a reduction in gene pathways related to signaling and in ammatory recognition. Adults on the other hand demonstrated decreased in ammation, pathogen sensing and myeloid function compared to infants and children. In our study, we attempted mitigation of polymicrobial and age-based factors.
Firstly a single organism approach, as suggested by Wong et al [21], was undertaken. Thereby studying Neisseria Meningitis associated MSS, aiming to simply the investigation of complex host-pathogen interactions. Secondly, with respect to age, patients recruited included infants with no previous comorbidities, from a similar age-range.
The use of WGCNA as an improvement over standard statistical methods for differential gene expression has been previously studied. Here Langfelder investigated the use WCGNA for hub-gene selection nding WGCNA as an improvement over standard statistical approaches incorporating the p value [22]. However Langfelder also found with regards to analytical repeatability using independent data sets, that standard statistical methods were an enhancement over WGCNA. Further, WGCNA methodology is advantaged by type 1 and type 2 statistical error minimization. Moreover, WGCNA applied to sepsis may show potential beyond traditional clinical biomarkers. For example, LONG et al combined WGCNA with a machine learning algorithm and applied this work ow to three publicly available sepsis datasets [23]. Then by applying arti cial intelligence concepts to WGCNA, Long et al presented a diagnostic classi er with the potential for early diagnostic bene t.
A drawback of our gene-expression study relates to the small number of patients recruited. However WGCNA sample-sets ideally should contain at least 15-20 samples [24]. Thus from a mathematical perspective, there were su cient data-points. A further drawback of this study relates to the labeling of patient's according to the PICU admission. This allotment of time points is independent of the timing of onset of infection. The arbitrary labelling across a sepsis time trajectory, could affect the analysis of timerelated changes in gene function. Further using non-continuous physiological data, such as PELOD scoring may, on the one hand, simplify the dataset. On the other hand, such an approach could impede genome to clinical trait matching.
The ability to match clinical parameters to gene modules in infants, using a gene modular approach is an advancement in application of sepsis transcriptomics. However the timing from infection to the development of symptoms differ across the recruited patients, which could affect analysis. As the methodology adopted encompassed all temporal datasets, this could help mitigate any disadvantages with respect to time constraints. Further, there is the challenge of inter-individual variation with respect to the timing of infection. This being related to such facts as symptom onset, rapidity of pathogenesis, ability to seek medical assistance etc. Countering this potential disparity in transcriptomic datasets was the fact that all patients received standard treatment, aiming for physiological stabilisation. Thereby it could be argued that the transcriptome studied re ects a uniform therapeutic impact upon the transcriptome. Finally, the temporal nature of this study takes advantage of an inherent bene t of time-related gene expression datasets. Namely that, for each patient, one dataset is related to the next, due to the evolving nature of sepsis. This connectivity may be advantageous to network methodology.
As the sepsis transcriptome can be affected by age, gender and ethnicity, future work requires larger datasets. Greater patient numbers would then allow subgroup analysis according to various host-related factors. Further, in this study, the relationship between clinical parameters and gene function, during acute sepsis, has been sort. Close matching of clinical physiological data to genomic modules was attempted. However further work in physio-genomics is required from the temporal perspective. Accordingly, by recording continuous clinical data-points, from the time of PICU admission, could allow an enhancement in the matching of bed-side physiological changes to gene function. In this way, a larger physiological dataset could provide improved prognostication using WGCNA a gene-modular approach.
In summary this study using time-related trajectory transcriptomic data, gene co-expression network analysis has been applied to understand sepsis evolution. In particular, the study showed the capability of using WGCNA in matching gene-expression to clinical trait information. The bene t of using network methods for the isolation of biologically signi cant gene modules, diagnosis, therapy and prognostication in sepsis requires further clinical study.

Conclusions
Using time-related trajectory transcriptomic data, gene co-expression network analysis has been applied to understand sepsis evolution. In particular, the study showed the application of WGCNA in matching gene-expression to clinical information from the patient's bedside. On this basis, the "Time series transcriptome" has shown potential in following the MSS trajectory. This opens the possibility for clinicians to visualise the affects of treatment on disease pathogenesis during a critical phase of MSS. The bene t of using network methods for the isolation of biologically signi cant gene modules, diagnosis, therapy and prognostication in sepsis requires further clinical study.

Declarations
Ethics approval and consent to participate The study and analysis was undertaken in accordance with the declaration of Helsinki. Accordingly, ethical principles were adhered to, through the guidance provided by the declaration of Helsinki for physicians and other participants in medical research involving human subjects.

Consent for publication
All authors give consent for publication similar genes are grouped together in a tree structure with 'branches' denoted as gene modules. A module then consists of a collection of highly interconnected genes with high absolute correlation. F. A moduletrait matrix is then generated associating traits (horizontal axis) to Module Eigenes (Vertical Axis).

Figure 2
Module-trait associations are represented as a Heatmap plot of adjacencies in the eigengene network.
Each row corresponds to a module eigengene, column to a trait. Each cell contains the corresponding correlation and p-value (in parenthesis). The table is color-coded by correlation according to the color legend. Each row and column in the heatmap corresponds to one module eigengene (labeled by color) or weight. In the heatmap, white color represents low adjacency (low correlation), red represents high adjacency (positive correlation), green represents high adjacency (negative correlation).

Figure 3
Enrichment mapping through the cytoscape application generated signi cant pathways at time 0, pathways related to cell nuclear function; at 24 hours cytoplasmic gene function and at 48 hours, extracellular gene function. Gene-set enrichment results are graphically mapped to the Enrichment Map: node size represents the number of genes in the gene-set; edge thickness is proportional to the overlap between gene-sets. The enrichment score (speci cally, the enrichment p-value) is mapped to the node color as a color gradient. The node size is proportional to the odds ratio.

Figure 4
Box plot enrichment box plot of signi cant genes from the WGCNA clusters for PELOD 0hrs A. PELOD 24hrs. HALLMARK_INTERFERON_GAMMA_RESPONSE_MSigdb_C2 seen to be an outlying pathway. B. PELOD 48 hours. C. Enrichment results ltered using a p value (0.001) and an FDR (0.01) threshold. For the 48 hour PELOD, OR beyond 5.0 included the TRANSCRIPTIONAL REGULATION by RUNX3_REACTOME and Regulation of APOPTOSIS_REACTOME. The plots display the top 25 pathways.