Human endogenous retrovirus K activation in the lower respiratory tract of severe COVID-19 patients associates with early mortality

Critically ill 2019 coronavirus disease patients (COVID-19) under invasive mechanical ventilation (IMV) are 10- to 40-times more likely to die than the general population. Although progression from mild to severe COVID-19 has been associated with hypoxia, uncontrolled inflammation and coagulopathy, the mechanisms involved in progression to severity are poorly understood. By analyzing the virome from tracheal aspirates (TA) of 25 COVID-19 patients under IMV, we found higher levels and differential expression of human endogenous retrovirus K (HERV-K) genes compared to nasopharyngeal swabs from mild cases and TA from non-COVID patients. Proteomic analysis and RT-PCR confirmed the presence of HERV-K in these patients. Moreover, increased HERV-K expression was triggered in human primary monocytes from healthy donors after experimental SARS-CoV-2 infection in vitro. In critically ill patients, higher HERV-K levels were associated with early mortality (within 14 days) in the intensive care unit. Increased HERV-K expression in deceased patients associated with IL-17-related inflammation, monocyte activation and higher consumption of clotting/fibrinolysis factors. Our data implicate the levels of HERV-K transcripts in the outcome of critical COVID-19 patients under invasive mechanical ventilation.


Main Text
Severe acute respiratory coronavirus 2 (SARS-CoV-2), the etiological agent of 2019 coronavirus disease , continuously circulates and has caused over one 100,000/month since its original emergence into the human population 1 . Based on official statistics on laboratory-confirmed reports, the case fatality ratio of COVID-19 ranges from 1.5% to 10% in developed and developing countries, respectively 1 . Differently than other highly pathogenic coronaviruses from the 21st century, such as SARS-CoV and Middle East respiratory coronavirus (MERS-CoV), SARS-CoV-2 shedding occurs from the presymptomatic period to a few weeks after symptoms onset 2 . Longer viral replication favors tissue damage, as shown by the positive correlation between high activity of lactate dehydrogenase (LDH) activity, a marker of cell death, with COVID-19 progression 3 . While type II pneumocytes are targeted and destroyed by the infection and the respiratory parenchyma is harmed, innate and adaptive immunological responses are not always able to prevent further progression to poor clinical outcomes, and may even worsen the tissue lesions 4, 5 . In fact, severe COVID-19 has been associated with increased and uncontrolled release of pro-inflammatory mediators (cytokine storm), so that the resolutive mechanisms are overcome by a marked upregulation of IL-6, TNF-alpha and IL-1-beta 4,5 . In addition, immune cells that orchestrate the innate and adaptive response, such as monocytes and neutrophils, undergo pyroptosis and netosis 6,7 . Consistently, leukopenia and an uncontrolled coagulopathy, marked by platelet activation and high D-dimer levels, correlate with COVID-19 severity 8,9 . Altogether, SARS-CoV-2-triggered inflammation and hypercoagulability have rapidly been defined as main features of the natural history of disease progression from mild to severe COVID-19 clinical presentations 4 . 4 Up to now, the factors described above have been associated with disease progression from mild to severe, but they are limited to explain the mortality of critically ill COVID-19 patients. Further investigation is thus necessary to search for overlooked factors associated with these high mortality rates. Although the stay of COVID-19 patients in the ICU for weeks is more likely to predispose them to nosocomial infections, mortality is high even for patients negative for bacterial infections [10][11][12][13][14] . Despite the best clinical practice to routinely surveillance of bacterial infection in the ICU, unculturable and unbiased diagnosed viruses are neglected in daily practice. The systematic analysis of the virome from critically ill COVID-19 patients is thus necessary, especially in samples from the lower respiratory tract. Thus, we analyzed a cohort of critically ill COVID-19 patients under IMV with sustained SARS-CoV-2 loads, inflammation and coagulopathy to determine whether their lower respiratory tract virome, beyond coronavirus, could improve the rationalization of patients' outcome.
From March to December 2020, we prospectively included 25 critically ill COVID-19 patients requiring IMV, at the median age of 57-year-old and presenting the most common infection symptoms and comorbidities (Extended Table 1). Patients displayed high SARS-CoV-2 RNA levels (median of 10 6 copies/mL), laboratory markers of systemic inflammation and coagulopathy (because of elevated plasma levels of C reactive protein [CRP] and Ddimer, respectively), and case fatality ratio of 58% (Extended Table 1). Due to the IMV, the tracheal aspirate (TA) was the source of samples to perform SARS-CoV-2 RNA quantification and virome analysis. We were surprised that the TA of over 70% of the patients had higher SARS-CoV-2 RNA loads than other samples from the lower respiratory tract 15 (Extended Figure 1A). RNA content from TAs was unbiased sequenced and rendered an average of 2x10 7 genomic reads, of which 10% and 90% were virus-or human-related, respectively ( Figure 1A). Approximately 95% of the virus-associated reads 5 in the transcriptome were linked to SARS-CoV-2 ( Figure 1A). Nevertheless, we enriched the new coronavirus sequences using Atoplex kit (MGI, China) (Supplementary Table 1) to phylogenetically classify them into the emerging clades 19A, 20A and 20B, in the proportions of the 16%, 12% and 72%, respectively (Extended Figure 1B and C), reconfirming that the entire cohort was composed of COVID-19 patients. The SARS-CoV-2 emerging clades identified here were representative of the virus circulating in Brazil during the year of 2020 (https://nextstrain.org/ncov/global?dmax=2020-12-16&dmin=2020-01-16&f_country=Brazil).
In addition to SARS-CoV-2, human endogenous retrovirus K (HERV-K; also known as HML-2) sequences were consistently detected in the TA from these COVID-19 patients, at a proportion of 5 ± 2% (mean ± SEM) of the virome (Figure 1A and Supplementary   Table 2). Not so important, random phage types were detected in some samples with low coverage ( Figure 1A). Among all HERVs, HERV-K is the most contemporaneous in human genome, being incorporated during human-chimpanzee speciation 16 . Thus, it is noteworthy to find an human-specific marker associated with critically ill COVID-19 patients, as nonhuman primates are less likely to die from SARS-CoV-2 infection 17 , raising the attention to a possible role of HERV-K in the dichotomy of SARS-CoV-2 severity between humans and non-human primates.
Moreover, HERV-K was 5-fold more present in the virome of TA aspirates from COVID-19 patients under IMV than in nasopharyngeal swabs (NS) from mild cases, previously studied by us 18 (Figure 1B and Supplementary Table 2). HERV-K levels in the TA from deceased or discharged COVID-19 patients did not differ, but these values were significantly higher compared to TA from non-COVID-19 patients, randomly retrieved from sequence read archive (SRA) ( Figure 1B). The data from SRA indicate that HERV-K may 6 be found in the lower respiratory tract of some patients with other illnesses ( Figure 1B).
Indeed, HERV-K detection in the respiratory tract has been associated with lung adenocarcinoma 19 , as well as other types of cancer, neurological disorders, multiple sclerosis and arthritis 20 . Here we described that higher levels of HERV-K were found in COVID-19 patients that deceased earlier (Figure 1C), accentuating its association with disease severity. Of note, no statistically significant association was found between HERV-K and other features, such as days from COVID-19 onset, age, gender or SARS-CoV-2 RNA levels (Extended Figure 2).
Thousands of loci in the human genome are associated with HERV-K-related genetic elements and some of them are actively expressing retroviral structural genes 21 , we search for correlation between the HERV-K transcript consensus described here and active HERV-K loci in the human genome. Most often sequences from HERV-K structural genes were expressed from different chromosomic regions, suggesting the activation of otherwise silent genes (Extended Table 2 and Supplementary Figure 1). Indeed, critically ill COVID-19 patients differently expressed HERV-K-associated structural genes, doubling and tripling the gag-pro-pol and env transcripts compared to NS from mild cases and non-COVID TA SRAs ( Figure 1D). Expression of gag, in prostate cancer and multiple sclerosis has been associated with immune dysregulation 22,23 . The gene pol encodes for a reverse transcriptase that may jeopardize the cell cycle of lymphocytes because of its association with leukemia 24 . Gene pro is both a protease, which can directly cleave cellular protein with major functional impact in cell biology 25 , and a deoxyuridine triphosphate nucleotidohydrolase (dUTPase), whose activity leads to inflammation, immune dysregulation, and progressive obliterative vascular remodeling in the respiratory tract 26 .
HERV-K env may trigger cell-cell fusion, leading to epithelial to mesenchymal transition and various types of cancer, including in the respiratory tract 19,27 . 7 For further confirmation of HERV-K presence, we performed orthologue assays. TA from 14 patients were enough for further examination via shotgun proteomics (Figure 2A and B, and Supplementary File 1); which revealed more than 20,000 peptides linked to 2,249 human proteins, with a high degree of confidence (FDR <1 at the level of PSM, peptides and proteins identification). To trace similarities with the HERV-K proteome, we compared the peptides from tracheal aspirate human proteome and HERV-K proteins Gag, Pro, Pol, Env and Rec (Uniprot IDs # P62684, P63121, P63132, Q902F9 and P61574, respectively) through BlastP (NCBI/BLAST). A total of 167 alignments were detected with We next examined a possible correlation between HERV-K in the TA with immunemodulation and/or coagulopathy. For this purpose, Spearman correlation analysis for levels of cytokines, coagulation factors and immune cell counts were scored in deceased and discharged patients (Extended Figure 3). To be conservative when assuming statistical significance, we additionally performed regression analysis, for those markers that passed Spearman correlation, evaluating differences in angular and/or linear coefficients ( Figure   3). As a general tendency for the endogenous mediators, HERV-K reduced their levels in the TA (Extended Figure 3A) and favored inflammation in the peripheral plasma (Extended Figure 3B). HERV-K negatively associates with two survival/growth factors for immune cells in the blood, granulocyte colony-stimulating factor (G-CSF) 30 and nerve growth factor (NGF) 31 ( Figure 3A). Consistently, surviving patients presented higher NGF levels than those who died ( Figure 3A). Surprisingly, HERK-K and other HERVs had been reported as inducers of both G-CSF 32 (in immune cells) and NGF 33 (in neurons), however, HERV activation in CNS is associated mainly with the onset and progression of neurological diseases 34 , and the induction of G-CSF was evaluated only with a domain of the HERV-K TM protein 32 . As a function of HERV-K levels, other regulatory/antiinflammatory signals were also decreased in the plasma of deceased, such as IL-1Ra and IL-13 ( Figure 3A), which respectively antagonizes IL-1-dependent stimulus and favors an allergenic-like/TH2 response 35,36 . Interesting, the reduction of IL-13 production is reported also by a HERV-H-LTR-derived protein, together with the inhibition of CD4 and CD8 T cell responses 37 . Deceased patients respond to higher HERV-K levels increasing IL-17 ( Figure   3A), a further pro-inflammatory mediator that may upregulate IL-6, CRP and airway 9 remodeling 38 , and it is also described to be upregulated by HERV-K and others HERVs in autoimmune diseases 39,40 .
During severe COVID-19, clotting factors are intensely consumed 9 and HERV-K levels associated with specific modulations in survivors and deceased patients (Extended Figure 3C). In the light of HERV-K levels, an apparent higher consumption of factor V occurs, being more intense in patients that have deceased ( Figure 3B). As clotting cascades are activated, the fibrinolysis product, D-dimer, positively correlates with HERV-K levels in patients that died ( Figure 3B), interesting, the modulation of coagulation cascade genes by HERV-K is also described in immune cells 32  To the best of our knowledge, our work is the first evidence of the presence of HERV-K in the respiratory tract and in the plasma of critically ill COVID-19 patients. We also connected HERV-K levels with the pro-inflammatory and regulatory events that contribute for patient`s outcome. Our data also gives insights that HERV-K expression is a result of a broader gene expression event during COVID-19, as judged by different activated loci and its expression may further contribute to epigenetic remodeling and longterm consequences. In conclusion, our findings provide original evidence that SARS-CoV-2 triggers increased HERV-K expression, and high levels of HERV-K are associated with disease severity and early mortality.

Ethics and Patients
From March to December 2020, inpatients from the D'or Institute (ID'or) and Instituto Estadual do Cérebro Paulo Niemayer (IECPN) admitted in the ICU were included upon signed informed consent by their responsible relative. Both TA and Acid-citrate-dextrose (ACD)-anticoagulated blood samples were collected. All patients already had SARS-CoV-2 positive RT-PCR upon entrance in this ward. Nevertheless, we reconfirmed COVID-19 laboratory diagnosis, and summary data from the patients are presented in Extended

RNA extraction and RT-PCR
The total RNA from TA was extracted using QIAamp Viral RNA (Qiagen, Germany), and 100 ng of cDNA following manufacturer's guidelines.

Enrichment-dependent SARS-CoV-2 sequencing
Total viral RNA from TA was extracted and quantified with the QIAamp Viral RNA Genomic sequences were quality-scored, filtered, trimmed, assembled in contigs, and assigned an ID through a validated workflow for SARS-CoV-2 50 (Genome Detective).
Consensus fasta sequences were aligned with ClustalW in Unipro UGENE 51 (version 38), 13 and phylogenies were constructed with Nextclade 52 to assign the emerging clades (Supplementary Table 1).

Unbiased RNA-Seq
For an unbiased RNA-seq, metatranscriptomics approach, total viral RNA samples were applied to the MGIEasy RNA Library Prep Set (MGI Tech Co. Ltd., Shenzhen, China).
Briefly, RNA was initially fragmented by size (250 bp 14 were obtained. Gag, pol and env evolutionary history was inferred using the maximumlikelihood method and Tamura-Nei model using a discrete Gamma distribution, General Time Reversible model using a discrete Gamma distribution and Hasegawa-Kishino-Yano model using a discrete Gamma distribution, respectively. Cells were incubated with antibodies for 30 min at room temperature and fixed with 4%

Human lung epithelial cells (Calu-3) and African green monkey kidney cells (Vero
paraformaldehyde. Cells labeled with each antibody separately were used for appropriate 16 color compensation and isotype-matched IgG conjugated with the same fluorochromes were used as the negative controls. Lymphocytes, monocytes and neutrophils were recognized by their characteristic forward and side scatter and expression of specific surface markers as shown in Supplemental Figure S2. Flow cytometer (BD FACSCalibur) was used to acquire 2,000 to 5,000 gated events. Acquired data were further analyzed using FlowJo software.

Elisa
Blood samples were collected in ACD-containing syringes and plasma was obtained Bedford, USA). Desalted peptides were dried, suspended in 10 µl of 0.1% formic acid and aliquots corresponding to 0.5 µg/µl were separated for mass spectrometry analysis.

Mass spectrometry
The tryptic digests were analyzed by reversed-phase nanochromatography coupled to highresolution nanoelectrospray ionization mass spectrometry. Chromatography was performed using a Dionex Ultimate 3000 RSLCnano system coupled to the HF-X Orbitrap mass

Proteomic computational analysis
The raw data files were processed and quantified using PatternLab for Proteomics software 54 (version 4.0). Peptide sequence matching (PSM) was performed using the Comet algorithm against the protein-centric human database neXtProt 55  .

Statistics
The assays were performed blinded by one professional, codified and then read by another professional. All experiments were carried out at least three independent times, including a minimum of two technical replicates in each assay. Prism GraphPad software 8.0 was preferentially used to generate the data sets. One-way analysis of variance (ANOVA) was used to compare differences among 3 or more groups following a normal (parametric) distribution, and Tukey's post-hoc test was used to locate the differences between the groups; or Friedman's test (for non-parametric data) with Dunn's post-hoc test.
Spearman correlation was used for comparison of curves, as well as angular and linear comparisons between discharged and deceased patients. Statistically significant p values < 0.05 were registered.     (Table S1) and from healthy donors (HD) were evaluated for presence of HERV-K GAG by real time RT-PCR.