Impact of Hemolysis on Multi-OMIC Pancreatic Biomarker Discovery: Derisking Precision Medicine Biomarker Development


 Cancer biomarker discovery is critically dependent on the integrity of biofluid, and tissue samples acquired from study participants. Multi-omic profiling of candidate protein, lipid, and metabolite biomarkers is confounded by timing and fasting status of sample collection, participant demographics and treatment exposures of the study population. Contamination by hemoglobin, whether caused by hemolysis during sample preparation or underlying red cell fragility, contributes 0 – 10 g/L of extraneous protein to plasma, serum, and Buffy coat samples and may interfere with biomarker detection and validation. We analyzed 617 plasma, 701 serum, and 657 buffy coat samples from a 7 year longitudinal multi-omic biomarker discovery program evaluating 400+ participants with or at risk for pancreatic cancer, known as Project Survival™. Hemolysis was undetectable in 93.1% of plasma and 95.0% of serum samples, whereas only 37.1% of Buffy coat samples were free of contamination by hemoglobin. Regression analysis of multi-omic data demonstrated a statistically significant correlation between hemoglobin concentration and the resulting pattern of analyte detection and concentration. Although hemolysis had the greatest impact on identification and quantitation of the proteome, distinct differentials in metabolomics and lipidomics were also observed and correlated with severity. We conclude that quality control is vital to accurate detection of informative molecular differentials using OMIC technologies and that caution must be exercised to minimize the impact of hemolysis as a factor driving false discovery in large cancer biomarker studies.


Introduction
The current healthcare ecosystem is rapidly evolving toward deploying precision medicine strategies for increasing optimal strati cation of patients to improve clinical outcomes. These actions will predominantly focus on the use of molecular, digital, and clinical biomarkers that will characterize patients on multiple dimensions of phenotypic presentation. Standardization of quality parameters governing sample collection are important to ensure accuracy and reproducibility of potential discoveries ultimately easing translation back into the clinic. Molecular markers, whether genetic, proteomic, lipidomic or metabolomic, hold tremendous promise to deconvolute the biological presentation of patients. The composition of adaptive biological molecules (proteins, lipids, and metabolites) can be signi cantly in uenced by patient demographics, pharmacological agents, and sample handling processes which can hinder potential biomarker discovery and development.
Hemolysis represents a common sample processing outcome and can be due to handling, but also disease etiology rendering red blood cells (RBC) more labile for lysis. Hemolysis can occur for a variety of reasons and leads to the release of free hemoglobin into blood collection samples [1]. Due to some medical conditions, or as the result of taking certain medications, this breakdown of RBC's can be increased. Hemolysis has the potential to drastically alter the observed proteome of buffy coat samples due to contamination of hemoglobin and other high-abundance proteins seen in RBC. RBC are mainly comprised of hemoglobin and carbonic anhydrase-1, contributing 97% and 1% of the entire RBC proteome, respectively [2]. The buffy coat fraction of whole blood has been observed to be less than 1% of the blood by volume [3]. As a result, even minor contamination of RBC into the other fractions, or increased hemolysis due to medical reasons, can increase the concentration of hemoglobin and carbonic anhydrase-1 and potentially impact the observed proteome.
Guidelines governing omics analysis of clinical samples have been developed over the past decade as the use of such platforms has been broadly adopted in R&D and clinical trial assessments [4,5]. This includes

Study Design
There were 420 patients enrolled in this study: 224 males and 196 females. These fell into one of the ve categories as follows; healthy volunteers: 33, patients with pancreatitis: 113, early pancreatic cancer: 67, local pancreatic cancer: 115 and metastatic pancreatic cancer: 92. All volunteers participating in this clinical study NCT02781012 gave their informed consent for inclusion before they participated in the study. Research use of the samples was conducted in accordance with the terms outlined within the informed consent form and the terms set forth therein and with the tenets of the Declaration of Helsinki and its later amendments or comparable ethical standards.

Sample Collection
Whole blood samples were collected via venipuncture into EDTA tubes. All samples were processed and frozen at -80 o C within 3 hours of the blood draw. The plasma fraction was separated using centrifugation at 1200 x g for 10 minutes at room temperature and was aliquoted into separate tubes and frozen. During centrifugation, the buffy coat layer also separated from the red blood cells. The buffy coat layer was collected and diluted with 8mL RPMI buffer, transferred into a 50mL Leucosep tube, and centrifuged at 1200 x g for 10 minutes at room temperature to separate the buffy coat layer further from the red blood cells. Buffy coat was washed three times with PBS and pelleted to remove solution. Finally, the buffy coat was resuspended in 200µL of PBS and split between two tubes before being frozen at -80 o C. A separate vial of blood was collected for serum sample collection in serum separator tubes and was left at room temperature for 30-45 minutes to allow for the clot to form. Serum separator tubes were then centrifuged at 1200 x g for 10 minutes at room temperature. Separated serum was aliquoted and frozen at -80 o C. standardized sample preparation approaches and techniques, quality controls, and the recommended size of cohorts required to ensure statistical signi cance of potential ndings. However, protocols for quality controls regarding sample collection are de cient. Several key challenges have already been demonstrated in using bio uids for biomarker discovery, such as chemical modi cations of proteins or sample degradation during storage. Further, utilizing plasma and serum, which is often employed for convenience of collection, exhibits a wide dynamic range of protein concentrations, making the identi cation of low abundance potential biomarkers all the more challenging. One potentially impactful occurrence that should be included is the effect of hemolysis, which can directly contribute to both aforementioned challenges.
Herein, we performed mass spectrometry based lipidomics, metabolomics and proteomics analysis of plasma and serum from over 420 individuals in pancreatic biomarker clinical trial. Buffy coat samples were only subjected to proteomics analysis based on the sample amount obtained. A subset of samples obtained were impacted by hemolysis resulting in contamination of the matrix of interest. A comprehensive assessment of expressional patterns of proteins, lipids and metabolites was performed to identify hemolytic contamination in these samples. The proteome of buffy coat was most impacted, resulting in expressional changes of proteins originating from red blood cells. The use of markers impacted by hemolysis should considered with caution for exploration as biomarkers.

Detection Of Hemolysis
Upon receipt, all samples were accessioned and qualitatively assigned a colorimetric hemolysis score of 1-3 for plasma and serum and 0-4 for buffy coat following the color scale in Fig. 1[6]. A score of zero was reserved for buffy coat samples appearing clear to opaque white when buffy coat cells were most pure. Given the natural yellowish appearance of plasma and serum, a score of zero was never given, and a score of 1 was considered most pure.

Proteomics
Protein Extraction 65µL of raw plasma/serum was ltered through a pre-wet 0.22µm cellulose acetate spin lter. 40µL of the ltered plasma/serum was pipetted onto another pre-wet 0.22µm cellulose acetate spin lter and combined with 20µL of 80mg/mL lipid removal agent (LRA). The mixture was placed on a shaker for 30 minutes and then centrifuged. The resulting ltrate was roughly 40µL in volume and was combined with 120µL of Agilent Buffer A. The sample was then loaded into vials and placed on the Agilent 1260 series HPLC, and the top 14 abundant proteins were depleted using the Multi-A nity Removal Column 14 from Agilent. The depleted samples were collected into vials and protein concentration was determined using the Bradford Assay.
Buffy coat samples were lysed with a lysis buffer containing 5M Urea, 50mM Tris-HCl pH 8.3, 0.1% SDS, 1% Protease and Phosphatase Inhibitor Cocktail, and Optima LC/MS Water. 100µL of lysis buffer was added to each sample and mixed by pipetting up and down, and then the whole sample was immediately transferred out of the sample vial and into a 1.5mL Eppendorf tube. Each sample was sonicated with four 3-second pulses at 20% ampli cation to fully lyse the cells. Sonicated samples were centrifuged at 17,000 x g for 10 minutes, and the supernatant was then used in the Bradford Assay to determine the protein concentration.

Trypsin Digestion
Extracted proteins were trypsin digested as previously described [7]. In brief, proteins were reduced with 10mM Tris(2-carboxyethyl) Phosphine (TCEP) and alkylated with 18.75mM iodoacetamide before being precipitated in acetone overnight and digested with trypsin the next day.
TMT Labeling of Peptides
Extracts were divided in to three parts: 75 µL for gas chromatography combined with time-of-ight highresolution mass spectrometry, 150 µL for reversed-phase liquid chromatography coupled with high-resolution mass spectrometry, and 150 µL for hydrophilic interaction chromatography with liquid chromatography and tandem mass-spectrometry, and analyzed as previously described [8][9][10][11][12]. We used the NEXERA GC system was tted with a Gerstel temperature-programmed injector, cooled injection system (model CIS 4). An automated liner exchange (ALEX) (Gerstel, Muhlheim an der Ruhr, Germany) was used to eliminate crosscontamination from the sample matrix that was occurring between sample runs. Quality control was performed using metabolite standards mixture and pooled samples, applying the methodology previously described [13][14][15][16]. A quality control sample containing a standard mixture of amino and organic acids purchased from Sigma-Aldrich as certi ed reference material, was injected daily to perform an analytical system suitability test and to monitor recorded signals day to day reproducibility as previously described [8][9][10][11][12]. A pooled quality control sample was obtained by taking an aliquot of the same volume of all samples from the study and injected daily with a batch of analyzed samples to determine the optimal dilution of the batch samples and validate metabolite identi cation and peak integration. Collected raw data was manually inspected, merged, imputed and normalized by the sample median. Metabolite identi cation was performed using in house authentic standards analysis. Metabolite annotation was used utilizing recorded retention time and retention indexes,

Mediator Lipidomic Analysis
A mixture of deuterium-labeled internal standards was added to aliquots of 100 µL serum or plasma, followed by 3x volume of sample of cold methanol (MeOH). Samples were vortexed for 5 minutes and stored at − 20°C overnight. Cold samples were centrifuged at 14,000g at 4°C for 10 minutes, and the supernatant was then transferred to a new tube and 3 mL of acidi ed H 2 O (pH 3.5) was added to each sample prior to C18 SPE columns (Thermo Pierce) and performed as described [19]. The methyl formate fractions were collected, dried under nitrogen, and reconstituted in 50 µL MeOH:H 2 O (1:1, v/v). Samples were transferred to 0.5 mL tubes and centrifuged at 20,000 g at 4°C for 10 minutes. Thirty-ve µL of supernatant were transferred to LC-MS vials for analysis using the BERG LC-MS/MS mediator lipidomics platform as described.

Data Analysis
Proteins that had missing values in more than 85% of samples were considered unreliable, and therefore removed from further analysis. Data was normalized according to a median centering and variance scaling approach applied across samples [20,21]. Batches due to study cohort were corrected using an empirical Bayesian framework, ComBat [22,23]. Brie y, this method performed location and scale adjustments based on estimated batch effect parameters per protein and returned a corrected dataset for further analysis. The data was then used for identifying differential expression between different hemolysis score in plasma, serum and buffy coat. Missingness was calculated as the proportions of missing proteins in each sample.

Work ow, Design and Summary
To evaluate the impact of hemolysis on biomarker discovery utilizing a multi-omics platform, we compared proteins, lipids, and metabolites identi ed across plasma, serum, and buffy coat samples (proteomics only) acquired from 420 non-diseased and pancreatic cancer patients. Work ow of Proteomics, Lipidomic and Metabolomic analysis is shown in Fig. 1. A hemolysis score was recorded for each sample, ranging from 0-4 for buffy coat and 1-3 for plasma and serum. A summary of the distribution of hemolysis scores within each sample type can be found in Fig. 2. Buffy coat yielded the largest hemolyzed samples 37.1% -#0, 25.1% -#1, 24.8% -#2, 12.4% -#3, 0.4% -#4 hemolysis. Protocol of isolation of buffy coat from blood may be one of the major reasons for the large number of contaminated buffy coat samples.
In proteomics, 7302, 1971, 2146 proteins were identi ed and quanti ed in buffy coat, serum and plasma, respectively, using TMT labeling and 2D online LC-MS/MS. After ltering the data for proteins that have less than 85% missing values, a total of 3648, 453, and 492 proteins in buffy coat, serum and plasma, respectively, were obtained and used for further analysis. In lipidomics, 1318 structural lipids and 106 mediator lipids were identi ed and quanti ed after data ltration for analysis in plasma and serum samples. In metabolomics, a total of 514 and 508 metabolites were identi ed and quanti ed in plasma and serum samples, respectively, after data ltering and kept for further analysis.

Differentially Expressed Metabolites And Lipids
Lipidomics analysis revealed no signi cant changes in lipid expression for mediator lipidomics data when comparing samples with hemolysis scores of 2 + to 1 in both plasma and serum. However, for structural lipidomics analysis, 5 lipids were found to be down regulated, and 2 lipids up regulated in plasma, and 14 lipids were down regulated, and 11 lipids up regulated in serum (Table 1). More profound effects were seen in the metabolomics data. When comparing samples with hemolysis scores of 2 + to 1, a total of 51 metabolites were found to be down regulated and a total of 25 upregulated due to hemolysis in plasma (Table 1). For the same comparison in serum, 93 metabolites were down regulated and 21 were upregulated due to hemolysis (Table 1).
A summary of these results can be found in supplemental table 1. * Differentially expressed species due to hemolysis Missingness A subset of samples with the lowest hemolysis score was created, in this case, a score of 0 for buffy coat samples and a score of 1 for plasma and serum samples. This subset was used to lter the proteins, and only the proteins that have less than 85% missing values were kept in the full proteomics data. The missing proportions of proteins for each sample were computed, and samples were then grouped by hemolysis score of 0: 244 samples, score 1: 165 samples, score 2: 163 samples, score 3+: 85 samples in buffy coat (Fig. 2). The boxplots clearly indicate that as the hemolysis score of a sample increases, the number of proteins that are identi ed across the set within the sample decreases, and the medians of proportions of missing proteins are 0.299, 0.353, 0.406, 0.410 for the groups with hemolysis score 0, 1, 2, 3+, respectively (Fig. 3). This can be explained by an increase in the signal derived from the more abundant hemoglobin proteins contributed from the lysed red blood cells, suppressing the signal of the less abundant proteins and changing the dynamic range of the protein content that would ideally be identi ed from samples with little to no hemolytic contamination.

Differential Expressed Proteins
To assess the effect of hemolysis on relative protein expression in buffy coat, comparisons between hemolysis groups were performed as shown by volcano plots (Fig. 4A). Overall, 657 samples were included in this analysis. A total of 3,647 proteins were identi ed when assessing the differentially expressed proteins between samples with a score of 0 vs. 1 (Fig. 4A) Table 1). Lastly, we compared samples with a score of 0 vs. 3 + Fig. 4C, and a total of 592 proteins were consistently identi ed across all samples, with 238 proteins differently expressed proteins down-regulated and 187 proteins up-regulated at a 1.3 fold-change threshold and a p-value of 0.05. Hemolysis not only impacted the proteins identi ed but also impacted the quantitation of the differentially expressed proteins.
Further, comparisons between samples with no visual hemolysis (scores of 0 for buffy coat, scores of 1 for plasma and serum) were made to samples with visual hemolysis (scores of 1-3 + for buffy coat, scores of 2 + for plasma and serum). Differential expression of proteins was observed using volcano plots shown in Fig. 4, using a threshold of 1.3 fold change with a corresponding p-value of 0.05 to be considered differentially expressed. Overall, 250 proteins were found to be downregulated and 208 upregulated in buffy coat. Similarly, in plasma and serum, 2 proteins were found to be down regulated in samples scored 2 + compared to samples scored 1. A total of 22 proteins in plasma and 13 proteins in serum were found to be up regulated in the same comparison. A summary of these results can be found in supplemental table 1.

Impact Of Hemolysis On Hemoglobin
To study hemolysis via protein identi cation and relative quantitation, we assessed the expression of Hemoglobin Subunit Alpha (HBA1), Hemoglobin Subunit Beta (HBB), and Hemoglobin Subunit Delta (HBD) across all sample types and grouped by hemolysis score within each sample type. Hemolysis is generally classi ed as the lysis of RBC in circulation or during sample preparation, and as hemoglobin is one of the most abundant proteins in red blood cells, the hemoglobin expression increases due to hemolysis (Fig. 5A) and increased stepwise with increasing hemolysis score. A similar pattern was seen in both plasma and serum, with lower levels observed in samples with a hemolysis score of 1, and signi cantly higher levels observed in samples scored 2+ (Fig. 5b and Fig. 5C).
We also assessed the expression of carbonic anhydrase (CA1), histone H2B type 1-L (HIST1H2BL), and ubinuclein-2 (UBN2) (Fig. 6). CA1 is another major protein found in RBC's and is responsible for processing carbon dioxide in the body. The expression of CA1 is low in samples classi ed with a hemolysis score of 0, and increases similar to the hemoglobin protein expression with increasing hemolysis score (Fig. 6). HIST1H2BL and UB2 are both nuclear proteins whose identi cation is expected in buffy coat samples and not from red blood cells. HIST1H2BL and UBN2 expression follow the expected result, with higher expression in samples with hemolysis score of 0 and lower expression with increasing hemolysis score (Fig. 6), indicating signal suppression of these proteins as a result of hemolysis.

Discussion
Translation of biomarkers into clinical practice requires the comprehensive understanding of the impact of sample handling to avoid false discovery of processing markers rather than disease associated biomarkers.
Adaptive omic technologies such as proteomics, lipidomics, and metabolomics demonstrate tremendous promise associating the patient phenotypic with causal biology but are also signi cantly impacted by red blood cell contamination in plasma, serum, or buffy coat. In the current study, we uncovered that buffy coat was the most signi cantly affected by hemolysis in a prospective biomarker study investigating pancreatic cancer and at-risk populations. The incidence of hemolysis was independent of disease conditions (data not shown) but did in uence detection and quanti cation of analytes.
Contamination by proteins found in RBC from hemolysis has also been demonstrated in red blood cell storage in an occurrence known as storage lesions. Storage lesions are progressive changes in the morphology, biochemistry, and function of RBC during storage that result in changes in the viability of the RBC and accumulation of contaminating proteins and cells. These changes in RBC ultimately lead to hemolysis, and consequently, a release of the cytosolic contents into solution [1,24]. A study observing changes in the protein distribution of RBC supernatant over a storage period identi ed appreciable increases in proteins, including carbonic anhydrase 1 and 2 (CA1 and CA2), peroxiredoxin-1 and − 2 (PRDX1 and PRDX2), and catalase, as well as others, due to hemolysis of RBC over time in these storage lesions [25]. Similarly, our ndings also conclude these proteins to be contaminants in plasma, serum, and buffy coat due to hemolysis that may occur in vivo or during sample processing.
The identi cation of proteins in a sample depends on the dynamic range of the proteins. Identifying less abundant proteins in a sample via LC-MS/MS analysis is challenging at low concentrations as current mass spectrometry capabilities allow for identi cation over a range of 3-4 orders of magnitude [26]. Hemolysis increases the hemoglobin content in the sample of interest. Given that hemoglobin accounts for 97% of the composition of RBC's, with carbonic anhydrase accounting for another 1%, this can create signi cant suppression of signal of low abundant proteins in the bio uid of choice for a proteomic study [27]. In proteomics, sample quantitation is performed using equal volume of uid or equal concentration of protein content. In this study the equal concentration of proteins was used for semi-quantitation, supplemented by Tandem Mass Tags for protein quantitation. The general hypothesis is that the samples are identical with minor changes. Quantitation of proteins is impacted due to hemolysis which leads to an increase in concentration of red blood cell proteins. As contamination increases, the proportion of proteins of interest in the sample decreases and can lead to inaccurate quantitation and false discovery of the biomarkers. Hemolyzed samples should be avoided in omics studies to minimize data analysis variability and data interpretation errors. The use of differentially expressed species (supplemental table 1) as biomarkers of disease in any study should be viewed with caution due to hemolysis. For instance, carbonic anhydrase-1 has been demonstrated as a biomarker in serum for prostate cancer [28]. Further, peroxiredoxin-2 was identi ed as a biomarker in a panel of proteins from plasma for Anderson-Fabry disease [29]. While this may in fact be the case, careful consideration should be taken into sample quality while testing to avoid false positives, and analysis should be performed to conclude these proteins had little to no contribution to their signal from sample handling issues or hemolysis.
In clinical settings, omics analysis on serum, plasma or buffy coat samples requires caution while handling samples to avoid hemolysis. Following a set protocol is required when collecting and handling samples and any deviation in sample handling needs to be recorded. In some cases, even after all sample handling precautions have been taken, hemolysis may still occur due to underlying biological factors. In these scenarios, various methods can be used for data analysis to minimize the impact of contamination of proteins like hemoglobin. One approach is to ignore any contributing red blood cell proteins as a biomarker, if considered as contamination. A second approach is to use proteins such as hemoglobin or carbonic anhydrase to normalize the data and speci cally normalize only non-red blood cell containing proteins. This can minimize the impact due to hemolysis in quantitation. However, any attempt that might minimize this effect may not completely negate the impact due to hemolysis. A third approach is to move towards equal volume quantitation compared to equal concentration quantitation, however, this might require technical advancements in instrumentation and technology. Identi cation of contaminating proteins cannot be avoided, and expression of those protein rise with the increase in the hemolysis score. Sophisticated LC-MS/MS technology, biochemical procedures for sample preparation and advance bioinformatics tools need to be used for omics analysis in precision medicine. Using stringent puri cation procedures are of key importance in using blood samples for identi cation and application of biomarkers.
This study comprehensively assessed omics variables signi cantly impacted by the increase in hemolysis score in buffy coat and plasma/serum. Differences were identi ed that were associated with increasing hemolysis score, including missingness of proteins identi ed. Integration of lipidomics, metabolomics and proteomics data provided an expanded, comprehensive insight of the impact of hemolysis. Overall, our results will serve as a comprehensive resource to the biomarker community in the eld of blood analysis. Diagnostic applications will be able to leverage these proteins, lipids and metabolites identi ed as hemolytic contamination for future biomarker studies.

Declarations
Ethics approval and consent to participate This study was IRB approved and all patients consented to participate.
Consent for publication MAK: Drafted the work or substantively revised it, provided substantial contributions to the conception and design of the work and acquisition, analysis  Figure 1 Work ow of the methods used to study the impact of Hemolysis. Initially, clinical samples were assigned a hemolysis score of 0-4 following the hemolysis scale color legend. In proteomics, plasma and serum were ltered and depleted of the top 14 most abundant proteins, and buffy coat cells were lysed. Proteins were extracted and digested with trypsin before being labeled with TMT 10-Plex. TMT-labeled peptides were analyzed using 2D LC-MS/MS platform and quanti ed using Proteome Discoverer v1.4. In lipidomics, structural lipids were extracted via liquid/liquid extraction method on an automated Hamilton Robotics STARlet system. Extracted lipids were analyzed via direct injection electrospray ionization TOF-MS. Further, mediator lipids were acidi ed and extracted using SPE. Eluted lipids were dried and resuspended for LC-MS analysis. In metabolomics, metabolites were extracted in organic conditions and analyzed using gas chromatographymass spectrometry (GC/MS), reversed-phase liquid chromatography-mass spectrometry (RP-LC/MS), and hydrophilic interaction chromatography-liquid chromatography-tandem mass spectrometry (HILIC-LC/MS/MS). Post-processing of data included inspection, merging, and imputation.
Page 16/17 Volcano plots showing a comparison of protein expression between 3 vs 0, 2 vs 0 and 1 vs 0. The expression of protein ratio to the QCP was exhibited as Log2 fold and compared to -Log10 of p-value. Signi cant proteins required minimum 1.5 fold-change difference and maximum p-value of 0.05. Figure 6 is a boxplot of buffy coat proteins expressions of (CA1 = Carbonic anhydrase, HIST1H2BL = Histone H2B type 1-L, UBN2 = Ubinuclein-2) that were identi ed as signi cantly differentially expressed proteins in comparison to their Hemolysis score of 0,1,2 and 3+. Expression values are log2 ratio to the pool.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.