Insult or injury to the body – as mediated by a variety of non-infectious conditions such as neurodegenerative diseases, stroke, and blunt force head trauma– modifies normal physiological and biochemical function.1–4 Identification of specific signature patterns reflective of the insult or injury can facilitate the development of empirical diagnostics and targeted therapeutics.5 For instance, current diagnostics for mild traumatic brain injury (TBI) rely on neuropsychological questionnaires and imaging strategies for qualitative identification.6,7 The effectiveness of these diagnostics is limited by varied presentation of disease state, delayed onset of symptoms, comorbidities, clinical history, and differential long-term presentation, a limitation that can be overcome by the availability of empirical diagnostics.
Derivation of empirical diagnostic signatures for a given disease state requires a systems level understanding of the processes involved. The ‘omics revolution has enabled faster, cheaper, and higher-throughput analyses of genes, proteins, and metabolites facilitating identification of new targets for a variety of diseases.8,9 Where the genome is relatively resilient to external environmental influences, the human proteome and metabolome are more susceptible to environment and injury, making them ideal signatures for diagnostics development (Fig. 1a). Multi-omic studies have led to the identification of several biomarkers associated with a variety of diseases such as TBI.10–13 However, widespread clinical diagnostic development from such studies has been limited owing to intrinsic variability in observed biomarker profiles. One of the primary limitations hampering clinical translation of multi-omic observables is the biomarker patterns associated with a disease/injury are not cases of simple presence/absence, they must be coupled with a threshold concentration in the sample of interest (e.g., blood, cerebrospinal fluid, or urine). That threshold is challenging to determine without a reliable baseline under healthy conditions. Such a baseline should account for the variability in a given biomarker among individuals in a population (e.g., age/sex) Further, even within an individual- a biomarker profile measured is a “snapshot” of the current biochemical state and will vary with time in response to external influences.14 The availability of systematic, reliable, baseline signature profiles of healthy individuals that accounts for inter- and intra- individual variability is essential for assessing and characterizing disease-specific biomarker expression.15 The work presented herein aims to advance us a step further in that direction.
The brain is the most lipid-rich organ and consumes about 20% of the body’s total energy.16,17 Thus, insults and injuries to the brain (e.g. TBI), and the associated disruption of blood supply can generate a metabolic crisis that, if unresolved, can increase brain atrophy and worsen outcomes.18 When disrupted, CSF leaks into the blood thus, biomarkers normally exclusive to the brain but found in blood can yield information about the biochemical status of injury and disease. However, there are no comprehensive studies simultaneously comparing the proteomic and metabolomic profiles of matched CSF and serum samples within individual patients. Such comparisons are complicated by the fact that the native comparative multi-omic signature profile of CSF and serum in health individuals are not well defined. Further, CSF is a highly dynamic fluid and sample acquisition can give varied results depending whether CSF is obtained from the spinal fluid or directly via shunt the ventricular system.19
Herein, we present a comparative proteomic and metabolomic study of matched CSF/serum from 30 individuals with no previously documented adverse neurological conditions or ailments, to alleviate some of the above challenges associated with biomarker discovery for neurological insults. Figure 1b details the sample collection and processing of serum samples separated from blood collected by venipuncture and CSF samples were collected by lumbar puncture (L1-S1 vertebra). Aliquots of these matched CSF/serum were processed for proteomic and lipidomic profiling. Our population consisted of 15 females and 15 males ranging in age from 23 to 74 years (Fig. 1c).
Identifying Proteins In biomarker discovery, depletion of high-abundance proteins such as immunoglobulins and albumin (dg/L) to facilitate examination of lower abundance proteins (ng/L).20–22 However, in our initial scoping experiments, we found these depletion procedures contributed to high variance in the detected proteome, both in repeat measurements of a given sample and among similar samples. Therefore, we chose to sacrifice sensitivity to reliably detect proteins in very low concentration for reduced variability in the measured proteomic profiles. Proteins were cleaned up and digested on an S-Trap column then analyzed by LC-MS/MS on a Thermo Scientific Fusion Lumos platform running in Data Independent Acquisition (DIA) mode (Fig. 1b). Chromatogram library samples were individually searched against Prosit predicted databases and converted for ScaffoldDIA using a reference spectral library created in EncyclopeDIA v.0.9.2 (details in the Methods). Proteins were identified at a 10% false discovery rate (FDR) and minimum of one peptide.
Under these conditions, we identified 813 proteins in serum and 932 in CSF. Further, 801 proteins were shared between both samples, 12 proteins were unique to serum and 131 in CSF. The intensity of fragment ions was used to measure relative abundance between CSF and Serum. The total variance in intensity across proteins of the pooled CSF and serum samples was decomposed using Principal Component Analysis (PCA). That analysis revealed that the largest contributions to the variance of the pooled sample is the sample label, CSF vs. serum, which explains 56% of the total variance (Fig. 2a). In contrast, the second principal component only explained a small fraction of the total variance (2%). Figure 2b illustrates the relative differences in mean protein abundance between CSF and serum (x-axis) as a function of its associated Benjamini-Hochberg adjusted p-values (y-axis). Each point on the figure represents one of the 801 proteins identified in CSF/Serum. 317 proteins were significantly more abundant in CSF, with a 10-fold or greater difference in intensity. In comparison, 83 proteins were significantly more abundant in serum with a 10-fold or more difference in intensity. In this study, we explicitly demonstrate how changing the FDR changes unique protein coverage between serum and CSF (Fig. 2c, complete list in S1-S2). The number of proteins unique to each sample type decreases substantially with increasing FDR. We chose an FDR of 10% for this study as it balanced sensitivity and predictive accuracy.
Identifying Metabolites Understanding the basic metabolomic profiles under healthy, uninjured conditions can help underpin the normal relationship of metabolic signatures between serum and CSF within a given individual. Here, we detail the first matched comparative human CSF/serum metabolome. Metabolites were extracted with methyltertbuyl ether and methanol separated from the proteins, and derivatized for GC-TOF MS analysis using an Agilent 6890 GC and a Pegasus III TOF MS (full details in the Methods).23 Metabolites were identified and quantified using BinBase v 4.0.24–26 A total of 613 metabolites were identified across all samples. BinBase does analyze data as a function of FDR therefore, we compared CSF/serum in terms of relative MS abundance. For this, we applied similar statistical procedures as was described for identified proteins (PCA and t-tests). Figure 2d illustrates the first two principal components coordinates of the samples. Similarly, to the above results, the first principal component explained a large part of the total variance (58%). This variance also appeared to be largely due to differences between sample types (CSF vs serum). The second principal explained only 6% of the total variance. 29 metabolites were significantly (> 10X) more abundant in CSF, while 110 metabolites were significantly (> 10X) more abundant in serum (Fig. 2e). Metabolomics databases are immature thus, the number of metabolites that can be positively assigned represents a small fraction of the total number of detected compounds. Figure 2f illustrates the small fraction (182) of detected metabolites that could be assigned compound identity. A complete annotated list of identified and BinBase metabolites can be found in S5.
Demographic Analysis The impact of age and gender on variations in the proteomic and metabolomic profiles were assessed. For this, a principal component analysis (on the first 10 components) of the proteome and metabolome, assessing differences between CSF vs serum mapped onto each individual was performed. Ward hierarchical clustering of individuals revealed two subgroups within the CSF proteome and metabolome among healthy individuals. Examination of the demographics of individuals in these subgroups show that they differed on the basis of age (Fig. 3a/c). For the CSF proteome, the two groups averaged 39 years and 52 years in age, respectively (p = 0.04), while for the metabolome the groups had an average of 37 vs 53 years of age (p = 0.005). While membership to subgroup in proteome and metabolome are strongly positively related, there are individuals whose subgroup assignment are discordant (Fig. 3b/d). Notable neuronal proteins that differ based on age include apolipoprotein E, neuronal pentraxin-1, and reticulon-4. A table comprising the individual proteins (S3) and metabolites (S4) that differ between groups can be found in the SI.
Comparisons of Neurological Proteins of Interest Several biomarkers and physiological processes have been implicated in the pathology of neuronal insults and injury. Yet, many of these signatures are expressed in healthy cells – albeit at different concentrations than injured ones. In this section, we examine the relative distribution for some of these signatures in healthy CSF and serum, to establish a baseline for their expression and consequent extrapolation of change in insult and injury. Specifically, we compare MS intensities between apolipoproteins (Fig. 4a) and important neuroproteins in CSF and serum (Fig. 4b).10,27 Apolipoproteins, particularly Apo-E (P02649) are implicated in a variety diseases from cardiovascular, neurodegenerative and TBI.28,29 Apo-E is produced in the liver by hepatocytes and in the brain, and is the seventh major protein in CSF. Indeed, we found that Apo-E is expressed in significantly (10X) higher abundance in CSF over serum (Fig. 4a). Serum amyloid A1, A2, and A4 (SAA) (P0DJI8, P0DJI9, P35542) are constitutively expressed apolipoproteins that change expression in response to cytokine induced inflammation (IL-1, IL-6, IL-8, and TNFα). These proteins have been implicated to vary during the course of TBI, as a function of gender.30 In accordance, our findings indicate constitutive SAA levels are higher in serum over CSF, as expected for healthy individuals. However, contrary to other findings, we found no baseline difference in SAAs between males and females. Further comparison of Apo-A, Apo-B, and Apo-C revealed expected trends of higher baseline concentrations in serum over CSF, where these proteins are associated with host lipoproteins such as HDL, LDL, and VLDL. Other relevant insult and injury markers (Fig. 4b) observed in healthy CSF and serum include 1) IL-6 receptor subunit beta (P40189) present in greater abundance in CSF – activator of JAK-MAPK and JAK-STAT3 signaling, 2) the IL1 receptor accessory protein (Q9NPH3) - part of the IL-33 signaling system responsible for the pre- and postsynaptic differentiation of neurons, 3) serum amyloid P (P02743) - related to amyloidosis and aggregation in plaques, 4) amyloid-like protein 1 (P51693) - part of postsynaptic function and a transcriptional regulator, 5) amyloid precursor protein (P05067) - a metal binding protein important for axiogenesis, synaptogenesis, neuronal growth, and adhesion (among other functions), and 6) γ-enolase (P09104) - a highly important neuroprotective/neurotrophic enzyme with a broad range of biochemical functions was found exclusively in CSF.
Overrepresentation pathway-based analyses using bioinformatics tools are useful to identify patterns of proteins associated with known biochemical functions. Here, we used STRING of the proteins associated with axon guidance FDR = 2.14E-7 (Reactome R-HSA-422475) in CSF and serum are presented in Fig. 4c. This analysis takes cooccurrence, co-expression, direct experimental evidence, text mining, and database evidence to generate the clustering set at the highest confidence limit (0.9). Each node represents a single protein and the lines connecting the nodes are associated confidence. Proteins marked in grey – matrix metalloproteinase (MMP) 2/9, and roundabout homolog 1 (ROBO1) – control multiple biochemical pathways and were common in our overrepresentation analysis. Proteins detected in CSF that are not observed in serum include; moesin (MSN, P26038), major prion protein (PRNP, P04156), stromal cell-derived factor 1 (CXCL12, P48061), tubulin alpha-1B chain (TUBA1B, P68363), triple function domain protein (TRIO, O75962), sodium channel subunit beta-3 (SCN3B, Q9NY72), plexin-B1 (PLXNB1, O43157), netrin receptor (UNC5C, O95185). Of significance, many of these proteins are associated with actin remodeling, cell migration and growth. PLXNB1 and UNC5C are directly responsible for axon guidance necessary for neuronal tissue repair after a TBI.31 The single protein detected in serum, but not detected in CSF, is ezrin (EZR, P15311), a protein associated with axon guidance that forms complexes with radixin and moesin part of actin cytoskeleton. Variance in network associations of biochemical pathways, such as axon guidance, can provide useful information when there is disruption in the blood brain barrier or some other dysregulation in protein production and function.
Assigned Metabolites
An investigation of the relative MS intensities of the positively assigned metabolites are plotted based on biochemical class in Fig. 5. In these plots, each point represents a single metabolite and its x,y position is the intensity (0-100) in CSF vs. serum. Points near the diagonal y = x (hashed line in Fig. 5) are from proteins that have nearly equal concentration in CSF and serum. This analysis delineates the metabolic and biochemical needs of each fluid. For example, metabolites associated with sugar synthesis and metabolism are in greater abundance in CSF (Fig. 5a) where amino acid synthesis and metabolism are in greater abundance in serum (Fig. 5b). Circulating serum levels of free amino acids are reflective of protein intake and muscle synthesis. On the other hand, the brain is the most metabolically demanding organ, accounting for 20% of the sugar metabolism. Interestingly, synthetic sugars (e.g., xylitol, sorbitol, and mannitol) were all found in greater abundance in CSF over serum. Of the seven positively identified neuroregulators, (Fig. 5c) we found only one that was detected in both CSF and serum, 5-methoxytryptamine, 5-aminovaleric acid a weak GABA agonist. Serotonin and phenylethylamine were exclusively detected in serum perhaps owing to the lumbar puncture acquisition of CSF. We also note that neuroregulators are concentrated around the brain (Fig. 5d). The two neuroregulators detected solely in CSF were N-acetyl aspartic acid, a modified amino acid found predominately in neurons and the primary metabolite of serotonin, 5-hydroxy-3-indoleacetic acid.32 We found a greater number of lipids and sterols only in serum; notably palmitoleic acid, linoleic acid, deoxycholic acid, cholic acid, cis-gondoic acid, arachidonic acid, and beta-glycerolphosphate (Fig. 5e). A complete table of the identified metabolites and the relative MS intensities can be found in S5.
Gene Ontology Comparing the Proteome and Metabolome using Gene Ontology.
Assigning biochemical pathways to lists of proteins and metabolites reveals active/inactive biological functions that can be used to evaluate an individual’s disease state. Here, in our normal population we assign the most common classes of pathways by overrepresentation analysis to our “normal” population (Table 1). These results were derived from using Reactome on our datasets, as we found this to be most illustrative tool of normal biochemical function. Of the off-normal metabolic functions, we found platelet activation and degranulation likely associated with the sample acquisition from the lumbar and venipuncture procedures. The analysis identifies many of the expected normal biochemical pathways including extracellular matrix organization, hemostasis, immune system, metabolism of proteins, and vesicle mediated transport (Table 1). All pathways represented here are associated with soluble proteins/metabolites found in serum and CSF because cells were removed prior to proteomic and metabolomic profiling. Notably, our positively identified metabolites were quite low (as is common) therefore we get limited overlapping coverage with our proteome identified pathways. The complete table of associated pathways from detected proteins (serum S6 and CSF S7) and metabolites (serum S8 and CSF S9) that make up this table can be found in the SI.
Table 1: Overrepresentation analysis of intersecting pathways between the proteome and metabolome of CSF and serum from Reactome. The general class of pathways (left), pathways found (middle), and the number of either proteins or metabolites found associated with each (right). The media (CSF or serum) with a higher number of identified molecules was highlighted in green.