Assessing biomarker stability and assay performance parameters for the use of biomarkers in mental disorders; A study of early stage biomarker assay method development

Within the eld of psychiatry, the development of biomarker based assay methods is relatively young. Recent efforts focused on combining several biomarkers within a panel to increase discriminative power. However, most biomarker panels have failed to advance to the stage of clinical application. An important prerequisite is a proper sampling and storage procedure, based on a priori identied stability properties of all biomarker/body uid combinations present in the panel. Second, is the performance requisites of the assays in use, such as Enzyme-Linked Immunosorbent Assays (ELISA), in order to assure reliable results within and between runs. In this study, we analyzed 24 biomarker assays in 32 biomarker/body uid combinations identied clinically relevant for prediction of MDD. Each biomarker body uid combination was tested for stability and assay performance. We found hampering stability in almost all cases except three biomarkers in urine and three in serum. By having identied stability properties adequate measures can be taken to avoid interpretation mistakes. By having identied performance properties decisions in an early stage can be taken to assay implementation. This study indicates that a good starting point for biomarker panel assay development is the investigation of stability for each biomarker/body uid combination. In addition, assay performance plays an important role in the correct interpretation of those results. Along the way of assay development, other quality assurance parameters might be implemented focused on a t for purpose principle ultimately providing reliable data necessary for diagnostical method implementation.


Introduction
Within the eld of psychiatry, the development of biomarker-based diagnostic assays is relatively young. Current diagnostical methods within psychiatry rely heavily on clinical assessment of psychiatric disorder based on diagnostic classi cations such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) and the International Classi cation of Diseases (ICD-10) (Reed et al., 2019;Tolentino and Schmidt, 2018). Diagnosis of psychiatric disorders is however complex due to heterogeneity within disorder categories, and high co-morbidity rates with other psychiatric and non-psychiatric disorders. The overlap in symptomology may lead to the initiation of suboptimal treatment procedures, a high psychological burden for patients and increased costs (Fountoulakis and Stahl, 2021;Jentsch et al., 2017;Lilienfeld and Treadway, 2016). In addition, multiple biological pathways are believed to correlate with psychiatric disorders (Beijers et al., 2019;Jentsch et al., 2015;Saia-Cereda et al., 2017). Due to this heterogeneity it is unlikely that single biomarkers are identi ed with su cient discriminative power for clinical use. Therefore, recent focus shifted to nding disease, associated multiple biomarker patterns. Such an approach has shown to be fruitful by recent studies showing promising biomarker panels for major depressive disorder (MDD) (Bilello et al., 2015;Chen et al., 2018;Jentsch et al., 2020;Liu et al., 2015;van Buel et al., 2019). However, as multiple biomarkers are needed to identify a disorder, the margin of error in terms of reliability is considered small, demanding a high application stringency which currently hampers the advancement of these panels from a research setting to a clinical stage (Drucker and Krapfenbauer, 2013;Mcdermott et al., 2013).
One of the issues of hampering advancement of these panels to a clinical stage is the lack of thorough development of the analytical method. A fast majority of these biomarker studies utilize ligand binding assays (LBA) such as Enzyme-Linked Immunosorbent Assays (ELISA). Due to the biological nature of the method, ELISA-based methods are prone to variations in assay performance and protein stability, which impact reliability and reproducibility of measured analytes when not correctly controlled for during method development (Andreasson et al., 2015;Dakappagari et al., 2017;Gupta et al., 2017;van de Merbel et al., 2014). Studying sample speci c variations caused by for example; hemolytic and lipidemic content (EMA, 2012;Food and Drug Administration, 2019), storage condition and storage time can therefore negatively impact the reliability of measured results when no efforts have been made to assess the potential effects of these factors on assay performance and stability variations.
Within the present study we show the relevance of assessment (in)stability of various biomarkers in serum and urine under different conditions and how this may affect the reliability of the measured analytes in terms of assay performance. Doing this in the early stage of biomarker method development, improves proper choice of multiple biomarker panels.

Sample material and biomarker selection
To determine biomarker stability, sample material i.e. urine/serum was obtained from an existing cohort of 40 patients with MDD, included based on the Mini-International Neuropsychiatric Interview (MINI) and the Hamilton Depression 17 scale (HAM-D). For inclusion/exclusion criteria and demographic characteristics, please see van Buel et al. 2019(van Buel et al., 2019. The investigation was carried out in accordance with the latest version of the Declaration of Helsinki and the initial study design was reviewed by an appropriate ethical committee. informed consent of the participants was obtained after the nature of the procedures had been fully explained. Biomarker selection was performed on the basis of our previous work (Jentsch et al., 2020;van Buel et al., 2019) and supplemented with the biomarker Acetyl L-Carnitine as recent literature suggested that this protein is implicated in the pathophysiology of MDD (Nasca et al., 2018). All Biomarkers selected play a role in either one or more various major and minor hypothesis associated with MDD (Jentsch et al., 2015).

Sample pooling
All tests were performed on pooled patient samples which included/consisted of material of 8 patients per sample. Samples were pooled in order to assure su cient concentration levels of biomarkers suitable for analysis under different conditions and su cient amount of sample volume. Pooling of samples was performed by quickly thawing samples at 37˚C in a water bath and homogenizing by vortexing and keeping on ice until refreezing. All samples of a pool were combined and mixed thoroughly. Samples were aliquoted such that one aliquot could be used for a single biomarker assay with su cient testing volume.

Study design
Stability parameters investigated included storage of samples at various temperatures (-80 C, -20 C, 4 C, 22 C, and 37 C) for different time periods (see table 1) and cycles of freeze-thawing (0, 1x, 5x, and 10x). Depending on its applicability, the study was performed in urine, in serum, or in both.
For the temperature treatments, samples were stored at the indicated temperatures for the indicated periods and subsequently placed back at -80˚C until testing. For short incubations (up to 4hrs) samples were stored at -80 ˚C and thawed immediately prior to the start of temperature treatment.
Freeze thaw samples were thawed in an open box in a 37˚C incubator for 20 minutes, at which point they were thawed but not warm. Samples were homogenized by vortexing and refrozen in an open box at -80˚C.
Biomarker levels in urine and serum were determined by Enzyme Linked Immunosorbent Assays (ELISA). For the analysis of all biomarkers, various research and CE-marked ELISA kits were purchased (see table 2). Each biomarker was measured following the procedures provided by the vendor. An ELISA plate washer (Biorad PW40, California, USA) was used for all washing steps. TMB absorption measurements were performed on a Microtiter plate reader (Thermo Multiskan Spectrum, Massachusetts, USA) at 450 nm using 620 nm as a reference wavelength. Biomarker concentrations were determined by using a 4-PL curve-tting algorithm performed in Excel in which the optical density difference between 450 nm and 620 nm (OD 450 -OD 620 ) of the unknown samples were plotted against the measured optical densities of the known calibrators.
2.4 Stability and acceptance criteria.
Differences in measured sample concentrations were compared to the mean of frozen T=0 -80°C samples. Samples stored at -80°C are considered to remain stable for long time periods and can therefore be used as a measure of assay performance but also to serve as the reference in assessing stability decline under the various storage conditions tested .
Bias (as a measure of stability) was determined as the % difference from the reference sample and acceptance was set at 25% following OECD guidelines for bioanalysis (European Medicines Agency (EMA) Committee for Medicinal Products for Human Use (CHMP), 2011; OECD, 1998) and initial assessment of the biomarker data. All samples were measured in singlicate whereas the calibrators were measured in duplicate. For each biomarker, all different conditions were measured in one run on one plate to exclude inter run and inter plate variances.

Results
Results of the biomarker stability experiments are summarized in gures 1-5. All urine biomarkers were within acceptance criteria for the -80°C condition, with the exception of the biomarker Alpha 1 antitrypsin, indicating too high assay performance variability for this biomarker. The biomarkers APOA1, Midkine and Acetyl L Carnitine met all acceptance criteria for all conditions, whereas for the other biomarkers, biases varied at several conditions indicating either decreased stability or differences due to assay performance variability.
Within serum, Calprotectin, MPO and cAMP did not pass the acceptance criteria at the -80°C temperature condition, nor most other temperature conditions, indicating too high assay performance variability for these biomarkers. The biomarkers Cortisol, Prolactin and Resistin met all acceptance criteria whereas for the other biomarkers, passing of acceptance criteria varied. Supplementary S1 shows all absolute Bias values for each storage condition and time point. Table 3 shows the results of the freeze-thawing experiment. The urine biomarkers MPO, Aldosterone, APOA1, Calprotectin, cGMP, HVEM, Prolactin, LTB4, Resistin, Acetyl-L-Carnitine, Substance P and Lipocalin were considered stable under all FT conditions with the max bias under 25%. In serum, this was also true for the biomarkers Alpha 1 Antitrypsin, Calprotectin, Cortisol, EGF, Leptin, MPO, BDNF, Resistin, TNRF and Zonulin. In urine, Alpha 1 antitrypsin and thromboxane B1 show the least FT stability with measured concentrations above 25% after 5 and 10 cycles. With respect to the remaining biomarkers in both serum and urine, FT stability varied. Some biases varied above 25% after 1 cycle, but within 25% after 5 cycles and above 25% after 10 cycles, indicating assay performance variability rather than stability issues. This is the case for Cortisol in urine and Thromboxane, APOA1, Prolactin and Acetyl-L-Carnitine in Serum.

Discussion
To our knowledge, this paper is the rst to present stability data from a non-bioanalytical view of perspective on various urine and serum biomarkers measured with ELISA, while also showing that control of assay performance variability is essential for correct assessment of the results during scienti c biomarker research and development.
Both within urine and serum three biomarkers passed the stability and acceptance criteria under all conditions, whereas the other biomarkers varied in bias compared to the ref -80 C measured concentration. With respect to the freeze-thaw stability of various biomarkers, data indicated that within both serum and urine various biomarkers remained stable after 10 FT cycles whereas others lost stability after 5 cycles. To control proper results, the failing stability-and FT-properties of some biomarker/body uid combinations indicate the need for appropriate measures to be taken in both the sampling procedure and in the testing design. The failing assay performances are much more di cult to control and these biomarker/body uid combinations should be considered to be rejected.
Due to the nature of Ligand Binding Assays (LBA) like ELISA, a starting point for every matrix based biomarker study should be the assurance of con dent and reproducible results. Within our biomarker study we used of the shelve commercial ELISA kits, which are ideal for early studies focused on biomarker discovery. Vendors of commercial ELISA kits provide data information on cross reactivity, limit of quanti cation, calibration range and assay precision. The last two are often re-assessed before widespread use. To determine suitability for the study in question, these assay parameters can be supplemented with data on frozen storage stability and freeze/thaw stability as we have done in our biomarker stability experiment. Along the way of biomarker assay development, additional assay parameters may become of interest for better assurance of assay performance and reproducibility of results. Agencies such as the American Food and Drug Administration (FDA) as well as the European Medicines Agency (EMA), provide guidelines on assay development and validation directed towards the assurance of accurate and reproducible results (EMA, 2012;Food and Drug Administration, 2018;Rogatsky et al., 2017;Viswanathan et al., 2007). Biomarker assay development however, requires a more unconventional approach due to the biological nature of the analytes of interest. The development and validation of a biomarker assay should follow the context of use principle and be developed as t for purpose. Structural pillars for this development include assay parameters such as parallelism, selectivity, sensitivity and stability (Goodman et al., 2020;Piccoli and Michael Sauer, 2019). Based on data obtained for these parameters assay acceptance criteria can be set which can be different between various biomarker assays (Goodman et al., 2020). These factors do not need to be determined within an early phase but are considered imperative along the way of biomarker assay development to assure that the method is t for purpose.
Following the context of use and t for purpose principle, early phase biomarker assay development could bene t from implementing assay performance parameters such as acceptance criteria for calibrators (bias and CV) and the utilization of quality control samples. Acceptance criteria for calibrators assure that the used calibration curve is of su cient quality and also sets up the possibility to track assay performance over time. The use of Quality control (QC) samples in every analytical run can be used for run acceptance control and also provides a way of tracking plate shifts and assay drifts over time (Azadeh et al., 2019;Beaver and Roby-Peters, 2011). QC samples should be representative for the study samples used (same matrix preferably) and the concentrations of the analytes should also be representative for the study samples used. Preparation of QC samples for biomarker assays are complicated due to the endogenous nature of the analytes but various options are available. The concentration of QC samples subsequently needs to be determined over several runs after which a nominal value can be set on which acceptance criteria can be determined based on the precision of the measurements over several runs (Azadeh et al., 2019).
Within the eld of biomarkers for psychiatry, several studies have identi ed potential biomarkers which could be utilized within a diagnostic setting but results varied and the road to actual clinical application is still long (Bilello et al., 2015;García-Gutiérrez et al., 2020;Jentsch et al., 2020Jentsch et al., , 2015Kirkpatrick et al., 2020;Papakostas et al., 2013;van Buel et al., 2019). The complex underlying biological background of psychiatric disorders may however not be the only explanation for the huge variations in biomarkers studies. A fast majority of these biomarker studies utilize ligand binding assays which are often of the shelve research kits and no resources are spent at implementing basic assay performance criteria such as analyte/body uid stability. For example, cytokines are often an interesting target for psychiatric biomarker research (Benedetti et al., 2020;Capuron and Miller, 2004;Ioannou et al., 2021). Within these studies, blood samples are often analyzed from cohorts from which samples sometimes already have been stored for up to ve years prior to analysis. A study from 2009 (De Jager et al., 2009), showed that long term storage of cytokines are prone to degradation which in combination with increased variations in ELISA antibody binding capacity leads to increased variations in measured concentrations and reduction of reliability. This not only indicates that stability data on biomarkers are essential for assessing the suitability of samples but also shows that implementation of basic assay performance criteria may be of high value in improving the reliability of study results. Suitability of a biomarker assay might be further improved by incorporating also clinical acceptance criteria related to the clinical concentration range of a certain biomarker. Preliminary data (not shown) indicate that when applying clinical acceptance criteria based on variability in QC samples (assay performance parameter) relative to overall variability of the clinical samples, a valuable tool is obtained to discriminate suitability of a speci c biomarker body/ uid combination.
Within our study we set assay performance acceptance criteria but we did not incorporate additional assay performance parameters which could have increased the reliability of the generated results, possibly leading to a decrease of inconclusive results. Incorporation of duplicate or triplicate analysis in combination with a low CV acceptance criteria for example could further increase the validity of the measured biomarker concentration. Addition of low and high QC samples would have provided a level of assay performance measurement to assure that the assay performed as intended and could have been used as assay acceptance criteria. The large amount of inconclusive results in the literature may also be the result of pipetting errors which could have been missed due to measuring samples in singlicate instead of duplicate. Pipetting errors for example, can occur when using relatively small amounts of sample volume (≤ 10,0 µl) in some biomarker assays. Errors in dilution steps may also have contributed to the inconclusive results. Alpha 1 antitrypsin for example was in our assay at least 1000x diluted by following several dilution steps before adding samples to the ELISA plate. Due to the limited amount of information available with respect to the used assay, one cannot rule out possible dilution effects that impact assay performance. To assess assay performance with high diluted samples an dilution linearity experiment could have been performed (Miller, 2004).
In conclusion, basic assay performance and analyte stability parameters should be considered as the starting point of matrix based biomarker assay studies, using either commercial and/or in-house developed assays. Depending on the status of an assay, different requirements for an assay may be needed. In an early phase, a short investigation on reproducibility may be su cient. The more weight is ascribed to an assay, the more time should be invested to verify that the assay performs adequately and is t for purpose.  -80 C serum & urine biomarker storage stability: Biomarker stability measured after 336 hours of storage at -80 C is presented as the percentage of bias from the mean -80 storage biomarker concentration as presented in supplementary S1. The dotted area presents the +25% and -25% bias acceptance criteria in which biomarkers are considered stable at the measured temperature conditions and time points of storage -20 C serum & urine biomarker storage stability: Biomarker stability measured after various time points is presented as the percentage of bias from the mean -80 storage biomarker concentration as presented in supplementary S1. The dotted area presents the +25% and -25% bias acceptance criteria in which biomarkers are considered stable at the measured temperature conditions and time points of storage. Figure 3 4 C urine biomarker storage stability. Biomarker stability measured after various time points is presented as the percentage of bias from the mean -80 storage biomarker concentration as presented in supplementary S1. The dotted area presents the +25% and -25% bias acceptance criteria in which biomarkers are considered stable at the measured temperature conditions and time points of storage. Figure 4 20 C urine biomarker storage stability. Biomarker stability measured after various time points is presented as the percentage of bias from the mean -80 storage biomarker concentration as presented in supplementary S1. The dotted area presents the +25% and -25% bias acceptance criteria in which biomarkers are considered stable at the measured temperature conditions and time points of storage. Figure 5 37 C urine biomarker storage stability. Biomarker stability measured after various time points is presented as the percentage of bias from the mean -80 storage biomarker concentration as presented in supplementary S1. The dotted area presents the +25% and -25% bias acceptance criteria in which biomarkers are considered stable at the measured temperature conditions and time points of storage.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. S1OverviewofmeanbiomarkerconcentrationsFinal.pdf