Identication of Serum Prognostic Biomarkers of Severe COVID-19 by Quantitative Proteomic Approach

The COVID-19 pandemic is an unprecedented threat to humanity provoking global health concerns. Since the etio-pathogenesis of this illness is not fully characterized, the prognostic factors enabling treatment decisions have not been well documented. An accurate prediction of the disease progression can aid in appropriate patient categorization to determine the best treatment option. Here, we have introduced a proteomic approach utilizing data-independent acquisition mass spectrometry (DIA-MS) to identify the serum proteins closely associated with the prognosis of COVID-19. We observed 27 proteins to be differentially expressed between the cohorts of severely ill COVID-19 patients with adverse and favorable prognosis. Ingenuity pathway analysis revealed that 15 out of the 27 proteins might be regulated by cytokine signalling relevant to interleukin (IL)-1b, IL-6 and tumor necrosis factor (TNF), and their differential expression was possibly implicated in the systemic inammatory response and cardiovascular disorders. We further evaluated the practical prognosticators for the clinical prognosis of severe COVID-19 patients. Subsequent ELISA analyses further uncovered that CHI3L1 and IGFALS could be potent prognostic markers with a high sensitivity. Our ndings can help in formulating a diagnostic approach for accurately discriminating severe COVID-19 patients and provide appropriate treatment based on their predicted prognosis.


Introduction
Coronavirus disease 2019 (COVID-19) is a highly transmittable respiratory infection caused by the novel positive-sense, single-stranded RNA virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that emerged in Wuhan, China in 2019. Despite containment efforts, due to rapid person-to-person transmission the disease disseminated into a pandemic and is still spreading 1 . The molecular mechanisms behind disease progression leading to respiratory distress in COVID-19 patients are still unknown and no effective anti-viral therapies for COVID-19 have been established till date 2 . In order to optimize allocations of limited health care resources to the neediest patients, it is crucial to accurately predict the progress and prognosis of patients with this disease. In addition, severity risk management for patients contributes to further reduction in mortality 3,4 .
Most COVID-19 patients exhibit mild symptoms without dyspnea and abnormal chest imaging or moderate respiratory symptoms with pneumonia. They usually recover from illness with or without supportive treatment. About 20% of COVID-19 patients develop respiratory distress and requires immediate oxygen supplementation. A subset of these patients become critically ill developing rapid respiratory failure and severe hypoxemia, and requires immediate intensive care to prevent fatality. Considering the diverse variations in clinical manifestations of COVID-19, stratifying patients who are at risk for developing severe disease with adverse prognosis is crucial for selecting appropriate treatment strategies. For this purpose, identifying novel biological indicators that can serve as precise prognostic biomarkers is necessary to help clinicians make better clinical decisions and provide appropriate therapeutic strategies in earlier stages. To date, several clinical and biochemical parameters are used for predicting the severity of COVID-19 such as; C-reactive protein (CRP), serum amyloid A (SAA), interleukin (IL)-6, lactate dehydrogenase (LDH), white blood cell count, D-dimer, cardiac troponin and platelet count 3,5 . In addition, multiple serological factors involved in the severity of COVID-19 have been identi ed by proteomic approach using serum of the patients [6][7][8][9][10] . Most of these studies observed the protein pro les involved in systemic and/or local in ammation, and accompanying organ damage or dysfunction.
Although the currently available serological biomarkers can predict severe disease, the markers that predict clinical prognosis and mortality of severe COVID-19 patients are not yet reported.
In this study, we utilized the recently developed mass spectrometry technology with the data independent acquisition (DIA-MS) approach to identify the serum proteins closely associated with the disease prognosis. Using ELISA analyses, we further delineate the practical prognosticators for the clinical prognosis of severe COVID-19 patients. Consequently, we identi ed putative biomarkers that can indicate disease progression and adverse prognosis of the patients. These biomarkers shed light on a novel diagnostic approach that can serve to segregate COVID-19 patients based on the expected clinical prognosis and channelize appropriate measures for their management.

Results
Identi cation of serum proteins associated with favorable or adverse outcomes of severe COVID-19 patients by quantitative proteomic analysis To identify the serological biomarkers involved in the favorable/adverse prognosis of severe COVID-19 patients, we performed a comparative proteomic analysis with DIA-MS (Fig. 1A). In the discovery study, we obtained the MS data from serum samples collected within a few days after the start of special inpatient intervention of 10 severe COVID-19 patients with different prognosis ( ve adverse and ve favorable). By utilizing our customized spectral DIA library containing information of 1,534 human serum proteins, we identi ed 656 proteins to be differentially expressed in sera at the protein false discovery rate (FDR) 1% level. Among them, 495 proteins were selected for further statistical analysis with Perseus (Table S1). Subsequent principal component analysis (PCA) score plot visualized the distribution of the samples and revealed obvious separation trend between the two groups (Fig. 1B). To identify proteins that changed prominently along the disease prognosis, the signi cant changes of proteins in the severe ill patients with adverse prognosis were next analyzed using volcano plots. Consequently, 16 up-regulated proteins and 11 down-regulated proteins were selected being associated with the adverse prognosis of COVID-19 with statistical signi cance (p-value < 0.01, fold change (difference) > 2) ( Fig. 1C and Table 1).
Indeed, the heat map analysis exhibited hierarchical clustering of these proteins based on the expression levels be correlated with disease prognosis of severe COVID-19 patients (Fig. 1D). To investigate the biological processes affecting the severity of COVID-19, we carried out with an upstream analysis within the framework of the Ingenuity Pathway Analysis (IPA). The results showed that several increasing and decreasing proteins in the serum of the severe patients with adverse prognosis might be regulated by proin ammatory cytokines (Table S2). Notably, out of 27 differentially expressed proteins, 15 proteins were found to be regulated by IL-1β, IL-6 and tumor necrosis factor (TNF), which are seen at markedly higher levels in most of severe COVID-19 patients 11,12 (Fig. 2). In addition, a disease and disorders enrichment analysis suggested that the several differentially expressed proteins could be associated with cardiovascular disorders (Table S3). The result was consistent with the hypothesis that COVID-19 causes cardiovascular diseases, including myocardial injury and venous thromboembolism 13 . Simultaneously, in ammatory response, such as the degranulation of neutrophils was also enriched, as reported in the literature 7−10 . Furthermore, most of proteins among the 27 differentially expressed ones demonstrated to form an interconnected network as revealed by the STRING database ( Figure S1).

Identi cation of putative biomarkers for predicting prognosis of critical COVID-19
In order to search for the practical prognostic indicators for severe COVID-19 patients, we focused on two proteins, namely CHI3L1, and insulin-like growth factor binding protein acid labile subunit (IGFALS), which were most signi cantly correlated positively or inversely with the adverse prognosis of the severe COVID-19 patients in statistical analyses (Table 1). We excluded myoglobin (MB) since it has previously been reported as a prognostic marker 9 . To evaluate the candidates for clinical bene t, the relative expression changes associated with the prognosis of 41 severe COVID-19 patients (10 adverse and 31 favorable; Table S4), we analyzed the serum levels of these proteins using ELISA assays. The clinical information of the recruited patients enrolled in the evaluation study is presented in Table S4. These parameters, excluding ECMO care and outcome of death, were not different among adverse and favorable groups. On the other hand, ELISA assay for the levels of these proteins in severe COVID-19 patients with different prognosis showed signi cant changes between the adverse and favorable prognosis groups (p < 0.005), suggesting that the serum levels of these proteins also correlate with the adverse prognosis of the severe COVID-19 patients (Fig. 3A). We further assessed the predictive ability of serum levels of CHI3L1, and IGFALS, to detect the COVID-19 patients with adverse prognosis using the receiver operating characteristic ( Figure   S2). More precise statistical analysis in AUC also corroborated that CHI3L1 and IGFALS were markers with higher reliability, compared to CRP or D-dimer (Fig. 3B). Furthermore, the AUC [95% CI] of a model using both values of CHI3L1 and IGFALS was 0.91 [0.797-0.990]. Consequently, the serum expression levels of these two proteins can enhance the clinical diagnostics by providing a precise indication of the outcome of a severely ill COVID-19 patient.
Discussion COVID-19 displays variable illness or symptoms ranged from asymptomatic cases leading to the spontaneous recovery, to acute respiratory distress syndrome (ARDS) characterized by respiratory failure and diffuse alveolar damage 14 . While most patients with severe respiratory disorders recover from the illness, a substantial number of people die of respiratory failure and/or systemic complications. Choosing the right target population which is most susceptible to adverse prognosis would be the right way for channelizing intensive medical management. Identifying putative risk factors and/or biomarkers for predicting prognosis of critical COVID-19 is a felt need which might help clinicians and health care professionals to select or device an appropriate management protocol for COVID-19 patients. For this purpose, we performed DIA-MS-based proteomic analysis, which has the potential to discover proteins related to adverse prognosis even without prior knowledge, using the serum of COVID-19 patients. Consequently, we identi ed 27 candidate proteins that were selectively increased or decreased along with the adverse prognosis. Subsequent statistical analysis with ROC curve explored two putative prognostic indicators that can be useful for predicting the most likely clinical prognosis of severe COVID-19 patients. As a result, we identi ed two putative newly-developed prognostic indicators, namely CHI3L1 and IGFALS.
Several studies have reported that most severe cases of COVID-19 exhibit marked increase in serum proin ammatory cytokines 11,12,15 . Therefore, the current understanding of the disease suggests that cytokine storm along with the immunological dysregulation triggered by the viral replication phase contribute to the progression of severe ARDS and multiple organ failures in COVID-19 11,16 . However, IL-6 levels in COVID-19 patients are lower than the median values typically reported in ARDS 17,18 , and there could be other unidenti ed determinants that de ne COVID-19 severity. In our current study, we analyzed the molecular relevance of CHI3L1 and IGFALS by an upstream analysis within the framework of the IPA. Consequently, we found that the expression levels of these proteins were regulated by pro-in ammatory cytokine such as IL-1β, IL6 or TNF. This nding indicates that our newly-developed biomarkers could be "surrogate" markers of the proin ammatory cytokine network and cascade. CHI3L1 (chitinase-3-like protein 1), also termed YKL-40, is a protein that binds tightly with chitin but lacks chitinase activity. Our current study found that the serum CHI3L1 levels increased depending on the disease severity and adverse prognosis of COVID-19 patients. Parallel to this nding, previous studies have shown the correlation of high blood CHI3L1 levels with increased risk of death from various causes, including cardiovascular disease 19,20 . Additionally, elevated levels of the circulating CHI3L1 are associated with patients with idiopathic pulmonary brosis (IPF) 21 . Immunohistochemistry also shows that the expression levels of CHI3L1 are enhanced in bronchiolar epithelial cells and alveolar macrophages adjacent to brotic lesions of the patients with IPF, suggesting the possible involvement of CHI3L1 in the brotic process of IPF 21 . These ndings together suggest that CHI3L1 plays an important role in the tissue remodelling of respiratory system in COVID-19 following the massive in ammation and alveolar destruction and local tissue remodelling 22,23 . Therefore, higher levels of CHI3L1 might be associated with the pathogenesis of COVID-19 especially in relevance to pulmonary tissue damage and repair.
Our current study also demonstrated that IGFALS levels were down-regulated with disease severity and adverse prognosis in COVID-19. In the general state, IGFALS forms a ternary complex with IGFBP3 and Insulin-like growth factor 1 (IGF-1). The binding of IGFALS/IGFBP3 with IGF-1 was shown to prevent the interaction of IGF-1 to its receptor IGF-1R, and also reduced the stability of IGF-1 to suppress its biological function 24 . It was also observed that plasma levels of IGF-1 were signi cantly reduced in complete IGFALS de ciency mice, suggesting the accelerated reduction of half-life without any changes in their liver or renal expression 25,26 . Consequently, the de ciency of IGFALS proteins causes disruption of the entire IGF-1 circulating system, without affecting glucose and insulin homeostasis 25 . The role of IGF-1 signaling in brotic processes is variable depending on its spatial and stoichiometric conditions 27 . Irrespective of COVID-19, IGF-1 level diminishes gradually at later broproliferative stage representing negative correlation with the mortality of patients with ARDS 28,29 . Moreover, a recent study has indicated that low serum IGF-1 levels are associated with higher risk of mortality risk in COVID-19 patients 30 . These nding may together suggest that serum IGF-1/IGFALS levels are directly or indirectly involved in respiratory dysfunction. However, the regulatory mechanism of IGFALS and IGF-1 in COVID-19 remain elusive and further studies will be required to determine the functional role of IGF-1/IGFALS in the pathogenesis of COVID-19.
The ndings of this study could enhance the diagnosis of COVID-19 patients with severe pneumonia at high risk of mortality by combining the serum levels of two proteins closely involved in the pathogenesis in COVID-19. The capability of CHI3L1 and IGFALS for discriminating COVID-19 patients with adverse prognosis from favorable prognosis was superior to that of the existing biomarkers; CRP and D-dimer.

Limitations Of The Study
Although we have performed a comprehensive proteomic analysis in the current study, further prospective studies will be required to validate the quality of these biomarkers. This could be achieved with a multidisciplinary approach, and a multivariable statistical analysis of these prognostic biomarkers in order to increase their accuracy of detecting clinical prognosis of severe COVID-19.

Human Samples
Serum samples were obtained from COVID-19 patients who were hospitalized at Yokohama City University Hospital, Yokohama City University Medical Center, and National Hospital Organization Yokohama Medical Center from February to June 2020. This research plan and protocol was approved by the Clinical Ethics Committee of Yokohama City University Hospital (B2002000048). This study was also performed with the approval of the Clinical Ethics Committee in each of the medical facilities. Informed consent was obtained from all patients and/or their guardians before serum samples collection. This study was conducted in accordance with the Declaration of Helsinki. All the data was anonymized before the analyses.
All patients in the study were diagnosed as COVID-19 according to the manual for the Detection of Pathogen 2019-nCoV of the National institute of infectious diseases in Japan. Severe COVID-19 patients were divided according to the National Institutes of Health guideline. In addition, severe patients with an outcome of death or requiring ECMO care were designated as the patients with adverse prognosis, while others were designated as the patients with favorable prognosis. All serum sample were stored at -80°C until use and then denatured by adding an equal volume of 8 M urea solution for MS analysis.

LC mass spectrometry
Desalted peptides were resuspended in 0.1% formic acid and 2% ACN containing iRT peptides (Biognosys) and then analyzed using Q Exactive™ mass spectrometer coupled with an UltiMate™ 3000 HPLC system. The mass spectrometer was operated using Xcalibur software. Peptides were loaded on a trap column (100 μm × 20 mm, C18, 5 μm, 100 Å, Thermo Fisher Scienti c) and subsequently were separated on a Nano HPLC capillary column (75 μm × 180 mm, C18, 3 μm, Nikkyo Technos) at a ow rate of 300 nL/minute. Solvent A was 0.1% formic acid in 2% acetonitrile (ACN), while solvent B was 0.1% formic acid in 95% ACN. Peptides were eluted using a gradient from 2% B for 0-5 minutes, 2% to 33% B for 5-120 minutes followed by 90% B for 10 minutes, and then equilibrated for 20 min at 2% B. Data were acquired using either data-dependent acquisition (DDA) or data-independent acquisition (DIA).

Human sera spectral library generation
For the comprehensive serum proteome analysis, we attempted to construct the original DIA-MS system for human serum. To construct a serum spectral library, human pooled sera purchased from KOHJIN BIO, BioWest, PANBIOTECH, and SIGMA were fractionated in three ways after removal of 14 human proteins (ALB, IgG, antitrypsin, IgA, transferrin, haptoglobin, brinogen, alpha2-macroglobulin, alpha1-acid glycoprotein, IgM, apolipoprotein AI, apolipoprotein AII, complement C3, and transthyretin) using a Human 14 Multiple A nity Removal System (MARS) column (Agilent Technologies) or after compression of dynamic range of protein abundance using a Proteominer beads (Bio-Rad) according to each manufacturer's instructions. First, the immunodepleted or compressed serum was fractionated using HPLC system with a C4 reversed-phase column (Vydac) and 20 fractions were independently subjected to in-solution digestion with trypsin (Promega) 31 . Second, he immunodepleted or compressed serum was separated into six pieces on a 5-20 % polyacrylamide gel, followed by in-gel digestion with trypsin 32 . Third, the immunodepleted serum was digested with trypsin, and the resulting peptides (240 μg) were separated into 24 fractions by 3100 OFFGEL Fractionator 33 . After desalting using a Stage Tip 34 , the obtained peptides were analyzed in a DDA mode. The Q-Exactive was set to positive mode in a top-20 con guration. DDA mode analytical conditions consisted of a full MS1 scan with the resolution of 70,000 with a scan range from 350 to 1,500 m/z, with the AGC set to 3e 6 (Full MS) and 1e 5 (MS/MS). The normalized collision energy was set to 27. Spectral library generation from 76 DDA-MS measurements dataset were performed using Spectronaut Pulsar X (Ver.12.0.2, Biognosys) by searching against the iRT fasta database (Biognosys) and a human protein sequences of UniProtKB/Swiss-Prot database (version January 28, 2019), allowing for variable N-terminal acetylation, N-terminal carbamylation, methionine oxidation and cysteines carbamidomethylation. MS1 and MS2 tolerances were set to dynamic, and two missed cleavages were allowed. Search results were ltered to a 1% peptide-level FDR using Spectronaut Pulsar X.

Sample preparation for DIA-MS analysis
After adding 20 ng/μl E. coli b-galactosidase (b-gal) as the internal standard 14 high abundance serum proteins (ALB, IgA, IgD, IgE, IgG, IgG [light chains], IgM, alpha-1-acid glycoprotein, alpha-1-antitrypsin, alpha-2-macroglobulin, apolipoprotein A1, brinogen, haptoglobin, transferrin) were removed by using a High Select™ Top14 Abundant Protein Depletion Mini Spin Columns (ThermoFisher Scienti c) according to each manufacturer's instructions. After centrifugal ultra ltration using Amicon Ultra centrifugal lters, immunodepleted serum samples were dissolved in 8 M urea solution. For veri cation analysis of depletion reproducible e ciency, the proteins separated by SDS-PAGE proteins were transferred to PVDF membranes and then incubated with anti-b-gal antibody (diluted 1:1,000), at room temperature (data not shown). Subsequently proteins corresponding to 2 ml of immunodepleted serum were reduced with DTT ( nal concentration of 10mM) and alkylated with IAA ( nal concentration of 25mM). The protein solutions were diluted to 2 M urea in 50 mM NH 4 HCO 3 and then incubated with trypsin ( nal concentration, 15 ng/μl) at 37 °C for 16 h. To prepare peptides for MS analysis, the resultant peptides were desalted by using a Stage Tip 34 , and subsequently eluted peptides were completely lyophilized, and kept at -80 °C until use.

DIA-MS analysis and data analysis
To determine protein abundance, the serum peptide samples were analyzed twice each in a DIA mode. DIA mode analytical conditions consisted of a full MS1 scan with the resolution of 70,000 full width at half-maximum (FWHM) with a scan range from 380 to 1240 m/z, with the AGC set to 3e 6 , followed by 40 DIA windows acquired at a resolution of 35,000 FWHM, with the AGC value to 3e 6 . The normalized collision energy was set to 28. DIA-MS data were analyzed using Spectronaut Pulsar X against the spectral library to identify and quantify peptides and proteins. Retention time calibration was set to the iRT peptides. The Biognosys default settings were applied for identi cation: excluding duplicate assay and estimation of FDRs using q-value as 0.01 for both precursors and proteins. Interference correction was activated and a minimum of 3 fragment ions and 2 precursor ions were kept for the quantitation. The area of extracted ion chromatogram (XIC) at MS/MS level were used for quantitation. Peptide quantity was measured by the mean of 1-10 best precursors, and protein quantity was calculated accordingly by summing 1-10 best peptides. Global normalization strategy and q-value sparse selection were used for cross run normalization. All other settings were set to default. All proteomics data are deposited in the ProteomeXchange Consortium via the jPOST partner repository (Project ID: PXD021702) (preview URL for reviewers: https://repository.jpostdb.org/preview/116104560600aa7ee9ab41, Access key: 2950). To perform downstream statistical quantitative analysis, we used Perseus (Max-Planck-Institute of Biochemistry), which is a software for functional analysis of large-scale quantitative data 35 . Distinct samples were categorized in the respective groups, the intensity values were log 2 -transformed and only proteins quanti ed in at least 70% of samples for each group were used for further analysis.
Normalization was performed by width adjustment previously to the imputation of the missing values (downshift=1.8 and width=0.3). PCA score plot and volcano plot were performed with Perseus. Protein interaction analysis was carried out with an online tool STRING (default setting) 36 . IPA was used for the biological analysis. According to manufacturer instructions, ELISA assays were performed to measure the circulating levels of CHI3L1 (cat# CY8088V2, MBL) and IGFALS (cat# 445907, BIOLEGEND). ROC curve analysis was performed to assess the predictive performance of CHI3LI, IGFALS, D-dimer and CRP. The optimal cut-off value was determined by Youden index. The internal validation was performed by bootstrapping and was done with 150 simulations to obtain a bootstrapped AUC.