In this study, we measured the expression of 2,000 plasma proteins with antibody micro-array technology from age- and sex-matched COVID-19 patients, non-COVID-19 sepsis controls, and healthy control subjects. Using machine learning-based protein subset identification, we identified a 28-protein model that accurately differentiated COVID-19 patients from their comparison cohorts. Furthermore, we determined an optimal 9-protein subset model that maintained high classification ability. Some identified proteins were associated with clinical and demographic characteristics in the COVID-19 patients. NLP of expert-curated expression information identified multi-system expression of the leading proteins. This study has identified a reduced protein signature for COVID-19 patients that contributes to COVID-19 pathophysiology characterization and may inform the development of therapeutic interventions upon further investigation.
Our critically ill COVID-19 cohort was similar to other reported cohorts, with only minor differences (8, 47–50). For example, the mortality rate in our COVID-19 patients was higher than reported by other studies and may suggest a greater illness burden in our patients (8, 47, 51). The platelet count in our COVID-19 patients was lower than reported in the literature (52–54), perhaps reflecting greater microvascular injury and overall microclot risk (55). Similarly, the PaO2/FiO2 ratio was also lower in our COVID-19 patients (8, 54), indicating higher levels of acute lung injury. Although COVID-19 lymphocyte counts, INR, and bilateral pulmonary complications were significantly different than in non-COVID-19 sepsis controls, they were similar to those in COVID-19 patients reported in the literature (50, 53, 54).
A unique 28-protein signature that differentiated COVID-19 patients from non-COVID-19 sepsis controls and healthy control subjects was determined. Each of the identified proteins was individually different in the COVID-19 cohort with high discrimination power, further positioning them as possible disease biomarkers. Time-based analysis and inspection of the pairwise subject comparison demonstrated no changes in COVID-19 protein expression over multiple ICU days and interventions, suggesting that the reduced protein signature is robust, reproducible and remains highly predictive of COVID-19 disease status over 10 hospitalization days. In addition, an optimal model consisting of 9 proteins (PF4V1, NUCB1, CrkL, SerpinD1, Fen1, GATA-4, ProSAAS, PARK7, and NET1) maintained the high classification ability found in the superset 28-protein model. The pairwise comparison analysis suggests that the nine-protein model may be more consistent across multiple days than the 28-protein model.
Correlation analysis comparing the expression of the 28-protein in COVID-19 patients with their respective clinical characteristics identified seven associations. Interestingly, four proteins correlated with measures of blood clotting, including the INR and PTT. The COVID-19 patients had significantly higher INR and PTT measurements compared to non-COVID-19 sepsis controls; however, the measurements were within the normal clinical range. Almost all patients across the two ICU cohorts had anticoagulation interventions. PCMT1 was negatively correlated with INR in COVID-19 patients but not linked to thrombosis in the literature. SerpinB5, ERRa, and IGFBP-5 measurements in COVID-19 patients were mainly lower than healthy controls and exhibited a positive correlation with PTT; however, similar to PCMT1, none of the correlated biomarkers have been linked to thrombosis previously. Hemoglobin was negatively correlated with fibronectin in COVID-19 patients, with all patients having fibronectin levels lower than healthy controls. MammaglobinA, a secreted glycosylated proteins involved in cell signalling and the immune response, differentiated COVID-19 patients who received high-flow nasal cannula oxygen therapy as an intervention (56, 57). Lastly, ProSAAS, a neuroendocrine hormone, was lower in those patients with pre-existing hypertension (58).
Serpins are a family of protease inhibitors that use conformational changes to inhibit target enzymes (59). Four of the 28 proteins that changed in COVID-19 were Serpins (A1, D1, A4, and A12), and all were downregulated. In line with a previous study, SerpinA1 was downregulated in our COVID-19 cohort (60). SerpinA1 is proposed to limit SARS-CoV-2 cell entry via inhibition of cell surface transmembrane protease 2 (TMPRSS2) function, a critical step in the required processing of the SARS-CoV-2 spike protein (61). In addition, SerpinA1 was associated with decreased COVID-19 severity (62, 63), and suggested as a potential COVID-19 treatment. Indeed, COVID-19 patients with moderate to severe acute respiratory distress syndrome improved in a phase 2 randomized control trial after SerpinA1 intervention (64). Administration of SerpinA1 is also suggested as a therapy for alpha-1-antitrypsin deficiency (AATD), in which there is an increased risk of emphysema, obstructive lung disease, and liver disease (65–70); however, it is unclear if AATD mutations are associated with COVID-19 severity (63, 71, 72). SerpinD1, a thrombosis inhibitor (73), competes with the SARS-CoV-2 spike protein to bind heparin, resulting in increased thrombosis risk (74). The regulation of SerpinD1 in COVID-19 is controversial, as a study has shown that SerpinD1 was higher in moderate and severe cases (75). SerpinA4, also known as kallistatin, exerts multiple effects on inflammation, angiogenesis, and tumor growth. A single nucleotide polymorphism in the SerpinA4 gene was linked to acute kidney injury in COVID-19 patients (76). Down-regulation of SerpinA4 was noted in COVID-19 non-survivors, indicating a persistent pro-inflammatory signature (77). SerpinA12 is an adipokine that has been linked to the development of insulin resistance, obesity, and inflammation (78). In COVID-19, the downregulation of SerpinA12 may heighten inflammation via the kallikrein–kinin system (79).
NLP analysis processed expert-curated expression information from the UniProt Knowledgebase to identify organ- and cell-specific biomarkers. Of the 28 proteins, 14 (50%) had organ system expression information, with most proteins linked to expression in the digestive and nervous systems. NLP cell-type analysis results were inconclusive, as only eight proteins had cell-type expression information.
Gastrointestinal system complications are prevalent in COVID-19 patients, including diarrhea, nausea/vomiting, and abdominal pain (9, 80, 81). Fen1, involved in critical DNA synthesis and repair mechanisms, was overexpressed in our COVID-19 cohort. Fen1 is reported to be involved in hepatocellular and gastrointestinal cancers (82, 83), and a novel antiviral strategy that utilizes FEN1 to decrease SARS-CoV-2 cellular functions has been proposed (84). The expression of both CrkL and fibronectin was decreased in our COVID-19 cohort. The former, which is associated with gastrointestinal cancers, has been suggested as a potential COVID-19 drug target (85–87). The latter is a widely expressed extracellular matrix protein associated with liver regeneration, fibrogenesis, and intestinal inflammation (88–90).
Nervous system symptoms in COVID-19 patients are prevalent, with COVID-19 severity being associated with increased neurological complications (91–93). Our NLP analysis identified proteins, mainly down-regulated, from our COVID-19 cohort that are linked to the nervous system. SHANK1, downregulated in COVID-19 patients, facilitates protein-protein interactions in excitatory synapses (94), and its downregulation may hinder neuronal communication (95). Our COVID-19 patients had decreased expression of PCMT1, a carboxyl methyltransferase. PCMT1 downregulation is linked to neurodegenerative diseases and may increase ß-amyloid production (96, 97). PARK7 is decreased in our COVID-19 patients and may not effectively perform its protective role against neurotoxicity and neuronal viability (98–100). PARK7 performs various cellular functions, including acting as a chaperone, interacting with transcription factors, and being involved in anti-oxidative properties under oxidative stress conditions (101–103). PARK7 is a critical protein involved in the gut-brain axis and related to altered gut microbiomes (104, 105). Nucleobinding 1 (NUCB1) is widely expressed in brain neurons and stabilizes amyloid protofibrils before they mature and become harmful in neurodegenerative diseases (106, 107); however, its downregulation in our COVID-19 patients suggests decreased neurological protective mechanisms. Presenilin2 is a crucial protein in neurodegenerative disease and was decreased in our COVID-19 patients. Presenilin2 is responsible for the cleaving enzymatic action required to form amyloid plaques and also forms Ca2+ leak channels that support the calcium hypothesis of AD (108–111). Similar to Presenillin2, ProSAAS, an amyloid anti-aggregant in Alzheimer’s disease, is decreased in our COVID-19 patients (112). ProSAAS is a neuroendocrine chaperone protein with protective effects against neurodegeneration, such that increased endocrine and neurological cell stressors are associated with elevated expression (113, 114). Galanin was downregulated in our COVID-19 patients and operates on the neuroendocrine axis with various functions throughout the central and peripheral nervous and endocrine systems (115). Fyn, elevated in our COVID-19 cohort, has a harmful role in neurological diseases and may be a potential target for neurodegenerative disease due to its ß-amyloid signalling and tau interactions (116–118).
NLP analysis also identified the endocrine system as potentially impacted due to differential protein expression. COVID-19 patients with hypertension had significantly lower expression of ProSAAS, which may be related to ProSAAS peptides involved in salt sensitivity (119). Diabetes diagnosis and insulin sensitivity have been linked to COVID-19 severity and mortality (120–122), and downregulated ERRa in our COVID-19 cohort is linked to insulin resistance, diabetes, and obesity (123–126). ERRa regulates glycolysis and lipid metabolism in multiple organs, along with steroidogenesis in the adrenal cortex (127–129). Similar to our cohort, lower IGFBP-5 expression was previously observed in COVID-19 patients (130), and IGFBPs are linked to diabetes and metabolic disorders (131–135). SerpinA12 was down-regulated in our COVID-19 patients and is associated with diabetes and obesity due to its insulin-sensitizing effects (136–140). The downregulated NUCB1 in our COVID-19 patients suggested a harmful effect related to type 2 diabetes as it performs amyloid stabilization in human islet cells to prevent fibrils in the pancreas that impact type 2 diabetes (106, 141, 142). The decreased PARK7 in COVID-19 patients could also be connected to a metabolic imbalance. PARK7 protects pancreatic beta-cells from oxidative stress conditions, and its deficiency is associated with decreased inflammatory and adipogenesis responses (143–145) and type 2 diabetes (146, 147). Lastly, Presenilin 2 is expressed in endocrine cells, but there is insufficient data on its role and association with diabetes (148, 149).
COVID-19 is linked with various cardiovascular changes, including vascular transformation, thrombosis, and angiogenesis (150–155). NLP analysis revealed proteins expressed in the cardiovascular system. GATA-4 is involved in cardiac remodelling, differentiation, and signalling by acting as a cardiogenic transcription factor (156–158). GATA-4 was reduced in our COVID-19 patients, indicating that subsequent remodelling pathways may be impaired. IGFBP-5 expression was reduced in COVID-19 patients (130), and it is an inhibitor of angiogenesis and vascular smooth muscle cell proliferation (159–161). PF4V1, decreased in our COVID-19 patients, is an angiogenesis inhibitor and may also regulate inflammation and thrombosis (162–165). SerpinA4 (Kallistatin) was lower in our COVID-19 patients (166), and it protects against vascular oxidative stress and inflammation as well as inhibiting angiogenesis (167–169). Thus, the decreased expression of IGFBP-5, PF4V1, and SeprinA4 in COVID-19 may be cardioprotective, perhaps via suppression of angiogenesis and vascular transformation. EphB4, also associated with angiogenesis, was downregulated in our COVID-19 patients (170–173).
The novelties of this study include the protein biomarkers identified, the immune microarray platform utilized, and several of the analytic techniques. Previous proteomics studies have also identified biomarker models that differentiate COVID-19 patients from non-COVID-19 sepsis controls and healthy control participants (174–177). While these studies identify a number of important biomarkers, they did not evaluate their effectiveness in a single combined model, which decreases the likelihood of cross-identity concerns with other diseases. The novel proteins identified in our study may be attributed to our use of an immune microarray platform, while other studies utilized mass spectrometry or proximity extension assays (174–179). Pathway analysis was used in previous studies to help understand COVID-19 pathophysiology (176, 178, 179); however, our approach utilized NLP to identify organ and cell expression patterns.
In this study, we identified a novel 28-protein signature and an optimal 9-protein signature that accurately classifies COVID-19 patients from non-COVID-19 sepsis controls and healthy control subjects; however, our study has several limitations. First, the number of subjects in each comparison group was limited; however, we used conservative methods to ensure appropriate analysis. Conventional statistics consisted of only non-parametric methods with strict Bonferroni multiple comparison correction. Machine learning classification utilized cross-validation with conservative parameters and without any hyperparameter tuning. Also, protein model building and testing consisted of separate data subsets to reduce overfitting. Second, not all identified proteins had UniProt Knowledgebase-curated expression information, leaving the potential for unrecognized patterns in organ and cell system expression. Similarly, there is a possibility for missed organ/cell identification with NLP; however, preprocessing of expression information was carefully done, and NER used a state-of-the-art biomedical model. Third, static protein measurements must be interpreted with caution as they do not always correlate with functional changes. As one example, Serpins undergo a conformational change to elicit biological effects and therefore require further functional analyses. Lastly, we only compared the COVID-19 biomarker signatures to other cohorts, but there may be cross-identity concerns with other illnesses. The use of multiple biomarkers would reduce this latter limitation. Although our exploratory study had these minor constraints, the data provided insight into the pathophysiological changes in COVID-19 patients.